Archive for October, 2012

Generating a clean MD5 Sum check file in python3

October 18, 2012 Leave a comment

This week I was stuck on a small problem involving me to generate hash sums for validation. Since I’ve been working on a automation project at work focusing on creating 100% hands free testing tools. A part of my test called for some basic procedures but I wanted to verify that the integrity of the data was sane. The original script that I was updating was written in Bash. Pretty much straight forward but yet, still room for improvement!  I decided to revise the script and convert it over to python, doing so would also make a more powerful tool giving the versatility that python has over simple bash scripts. src and dst are 2 parameters that are called elsewhere in the script. Essentially they are strings pointing to a path on your hard disk, for example dst = “/tmp/blah/blah/”

def prepimage(src, dst):
    Obtain sample files
    filename = "%s/md5sum.txt" % (dst)
    md5path = "%s/" % (dst)
    print ("Copying data and Generating md5sums")
    if not os.path.exists(dst):
        shutil.copytree(src, dst)

    #Generate md5sums
        list = subprocess.check_output(["ls", dst],universal_newlines=True)
        plist = list.split('\n')[0:2]
        f = open(filename, "wt")
        for item in plist:
           out = subprocess.Popen(['md5sum', item], universal_newlines=True\
                 , stdout=subprocess.PIPE, cwd='%s' % dst).communicate()[0]
    return 0

Lines 7 – 9 are simply creating my directory if it doesn’t exist.

We start @ Line 12,

list = subprocess.check_output(["ls", dst],universal_newlines=True)
        plist = list.split('\n')[0:2]

Using subprocess.Popen we kick of an “ls” command. dst is an argument that we set earlier pointing to a random directory. This will now give us output of something like this:

‘How fast.ogg\nJosh Woodward – Swansong.ogg\nmd5sum.txt\n’

the 2nd line will split the string using the delimiter \n to give us:

[‘How fast.ogg’, ‘Josh Woodward – Swansong.ogg’]

Now using this list, we can create a new file as I do in 14 and kick off a for loop to run md5sum against each entry in the list and write the output to our new file. The final output will look just like this:

cat /tmp/optical-test/Ubuntu_Free_Culture_Showcase/md5sum.txt 6e34a2a0eaa61748ba3a33015a84e813 How fast.ogg c9459a907b9345b289ba6c9e6517d4c2 Josh Woodward – Swansong.ogg

On the flip side, you can automate the integrity check by creating a new function and adding:

#Verify md5checksum
checkoutput = subprocess.Popen(['md5sum', '-c', 'md5sum.txt']
, universal_newlines=True, stdout=subprocess.PIPE\
, cwd='/media/CDROM/').communicate()[0]

Which should output:

How fast.ogg: OK

Josh Woodward – Swansong.ogg: OK



Python 3 also makes use of hashlib to generate the hashsum.. If you don’t need a checksum file then heres some alternate code you can use!

import hashlib
filename = "/tmp/file1.txt"
file = open(filename, 'rb')
filedata =
md5 = hashlib.md5()
md5sum = md5.hexdigest()
print (md5sum)

This would generate just the hashsums: