Comparing files via checksum with Powershell
August 1st, 2009
Have you ever had to push Gigabytes of files accross the network, knowing that the same version of many of those files already existed on the other end, but you didn’t have an immediate way of checking which ones were already in place?
If so, we can mitigate that problem somewhat with checksums.
Below is a simple function that will give you the checksum for a given file. For the sake of space and time, I’ve kept the function pretty minimal.
function Get-Checksum($file, $crypto_provider) {
if ($crypto_provider -eq $null) {
$crypto_provider = new-object 'System.Security.Cryptography.MD5CryptoServiceProvider';
}
$file_info = get-item $file;
trap { ;
continue } $stream = $file_info.OpenRead();
if ($? -eq $false) {
return $null;
}
$bytes = $crypto_provider.ComputeHash($stream);
$checksum = '';
foreach ($byte in $bytes) {
$checksum += $byte.ToString('x2');
}
$stream.close() | out-null;
return $checksum;
}
Let’s look at it in further detail.
It takes in the full path to a file as an argument, and an optional argument of your crypto provider of choice. If none is specified, it will default to using The MD5 crypto provider.
function Get-Checksum($file, $crypto_provider) {
if ($crypto_provider -eq $null) {
$crypto_provider = new-object 'System.Security.Cryptography.MD5CryptoServiceProvider';
}
Next, get the file, then attempt to read it in as a stream. If we are unable to open it, the function simply returns null instead of processing it further.
$file_info = get-item $file;
trap { ;
continue } $stream = $file_info.OpenRead();
if ($? -eq $false) {
return $null;
}
Next, call the ComputeHash method of the provider on the stream to create a byte array, and then create our empty checksum. For each byte, convert it to hexadecimal, then add it to the checksum.
$bytes = $crypto_provider.ComputeHash($stream);
$checksum = '';
foreach ($byte in $bytes) {
$checksum += $byte.ToString('x2');
}
At last, close out the stream and return the checksum.
$stream.close() | out-null;
return $checksum;
}
To see it in action, try something like this.
PS > Get-Checksum 'C:\test.log'
f0cdf2d3e32233c11e1e8c6c0190cffc
Great, so what can we do with this now?
For a very simple example, let’s say that we had two directories, ‘C:\content_current’ and ‘C:\content_new’, and we want to see which files have been updated in ‘content_new’ and need to be copied into ‘content_current’.
PS > $files_current = get-childitem 'c:\content_current'
PS > $files_new = get-childitem 'c:\content_new'
PS > $files_current
Directory: C:\content_current
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 4/30/2009 11:09 AM 18 test1.txt
-a--- 8/1/2009 12:37 AM 886 test2.txt
-a--- 8/1/2009 12:38 AM 173 test3.txt
-a--- 8/1/2009 12:40 AM 23 test4.txt
PS > $files_new
Directory: C:\content_new
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 4/30/2009 11:09 AM 18 test1.txt
-a--- 12/19/2007 10:39 PM 587 test2.txt
-a--- 8/1/2009 12:36 AM 173 test3.txt
-a--- 8/1/2009 12:58 AM 111 test5.txt
PS > $checksums_current = @{}
PS > $checksums_new = @{}
PS > $files_current | foreach { $checksums_current.($_.name) = get-checksum $_.fullname }
PS > $files_new | foreach { $checksums_new.($_.name) = get-checksum $_.fullname }
PS > $checksums_current
Name Value
---- -----
test4.txt 64f97c57ed1fee1c8f6a469d4e3e59ee
test1.txt b7f01f335ab05254f8e2a4e7a43bc8c5
test3.txt 821825877ac67a42e8cfa1a0a027f5b5
test2.txt a1f5410d7c8be3094d52efc4b8f0b889
PS > $checksums_new
Name Value
---- -----
test5.txt 87ac0df60f3103dbb45848257b06adf6
test1.txt b7f01f335ab05254f8e2a4e7a43bc8c5
test3.txt 588137a10d603c34b1098a2c310508c7
test2.txt b5bd85a76345398e14aab046f8d7aed3
So which files have changed in ‘content_new’?
PS > $checksums_new.keys | where { $checksums_new.$_ -ne $checksums_current.$_ }
test5.txt
test3.txt
test2.txt
This seems like a lot of work for what you get, until you consider situations of a much larger scale. Imagine that you might need to update a multi-Gigabyte content directory from one server to a collection of 1,200 servers spread around the globe. It’s such instances where doing similar comparison will save you a great deal of time, effort, and bandwidth.
Categories: PowerShell





Or you can just install rsync. Free (as in freedom and as in beer), and more clever than md5sum’ing whole files.
Thanks for your comments. That is a great suggestion, however it is worth pointing out that the aim of the article is to show one way of generating checksums and/or comparing files with PowerShell, hence the title. The story is simply a hypothetical situation, and perhaps I could come up with a better example than the one that I used here.