Verify Integrity of Files

When downloading something on a website, you've probably seen some obscure things like checksums, MD5 or SHA, SHASUMS beside or around the DOWNLOAD button. To the layman, those stuff are strange aren't they?

But actually, they are your good friends! Those are known as checksum or checksum files. These files are actually supposed to help you verify that your downloaded file is really bit-for-bit exact as published by the website, and not a broken download.

It's actually very simple to verify a file checksum. Simply use an app to verify that your file's checksum matches the website file's checksum.

In this guide, you will learn how to do this. I'm going to avoid using jargons, so you can understand. >.<

Three simple things to know

For beginners, a checksum is simply a number.

Checksum:

  • A checksum is a short hexadecimal number calculated from a given file's content, using a checksum algorithm.
  • A checksum is assumed to be unique to a given file:
    • Two same files have same checksums
    • Two different files have different checksums

Checksum algorithm:

  • A checksum algorithm is a mathematical formula.
  • Some common examples: MD5, SHA1, SHA256, SHA512 etc.
  • Presently, MD5 and SHA1 are broken algorithms because they have shown to produce same checksum for two different files. That means a received file's content may be in two possible states: 1) intact, or 2) corrupted and undetected by our checksum verification. That is bad! Therefore, it is recommended to use at least SHA256 and above.

Hexadecimal:

  • Humans use decimal which a base-10 numeric system (0 to 9). Hexadecimal is a base-16 numeric system (0 to 9, and A to F characters). For example starting from 0:
    • Decimal and hexadecimal start from 0, and go up to 9
    • Decimal 10 is hexadecimal A
    • Decimal 11 is hexadecimal B
    • ...
    • Decimal 15 is hexadecimal F
    • Finally, decimal 16 is hexadecimal 10 (hexadecimal power increases)
    • As you can see, A to F are really just digits in hexadecimal
  • You probably see HTML color codes like color: #ffa0c2. Now you know what they are! They are just 3 hexadecimal numbers: ff is decimal 255 (red), a0 is decimal 160 (green), and c2 is decimal 194 (blue). The same color code can also be written as color: rgba(255, 160, 194).
  • Hexadecimal is case-insensitive. For example, a is the same as A, f is the same as F, and a0b1c2d3e4f5 is the same as A0B1C2D3E4F5.

So... A checksum is simply a hexadecimal number that we need to verify.

Using QuickHash-GUI

I personally like Quickhash GUI which is a cross-platform (Windows, Mac, and Linux) freeware tool that can verify file checksums, and many other things. We're going to use Windows for this demo, it's the same for other Operating systems (Mac and Linux).

Step 1: Download QuickHash-GUI

First, download QuickHash-GUI v3.x.x (for Windows, Mac, and Linux):

Save the file into your Downloads folder:

Extract the file. E.g. C:\Users\user\Downloads\QuickHash-GUI-Windows-v3.3.1:

Navigate into the 64-bit folder, and launch Quickhash-GUI_x64.exe:

Now we are going to download a file to practice verifying its checksum.

Step 2: Download a file

Visit Wireshark, under Download Wireshark, click on Windows PortableApps (64bit) to download:

Save the file to your Downloads folder:

Step 3: Verify the downloaded file

Now in Explorer, drag-and-drop the downloaded file (i.e. in my case, WiresharkPortable64_4.0.4.paf.exe) into Quickhash-GUI window:

In Quickhash-GUI, on the left side under Algorithm, select SHA256. See that Quickhash-GUI shows the calculated SHA256 checksum (in my case, it is D72789CE7CA3715C044AC0913BA0603DF89699EBB6F3839547D64AC1FD9A1518):

Now we only need to verify that our file's checksum matches Wireshark's published checksum.

Now return back to Wireshark website, and scroll down to Verify Downloads section, and click on signatures file:

You will the checksums of all Wireshark published files. Find the file you downloaded (i.e. in my case, WiresharkPortable64_4.0.4.paf.exe), and copy its SHA256 checksum:

Now go back QuickHash-GUI, click Clear Hash Field button, then in the Expected Hash value box, paste the clipboard. You should see Expected hash MATCHES the computed file hash!, meaning the checksums match and the file is verified!

You may click OK to finish.

Using the Command Line

Instead of using a Graphical User Interface (GUI) tool like QuickHash-GUI in Step 1, we may also validate checksums using the command line.

Assuming we've completed Step 2, we just need to verify that our downloaded file checksum is D72789CE7CA3715C044AC0913BA0603DF89699EBB6F3839547D64AC1FD9A1518.

Windows

For Windows, we can use Windows Powershell.

Click Start and open Powershell:

Now type the following, pressing Enter after each line:

While typing you can press TAB for autocompletion of file names, parameter, and parameter values, try it. It greatly speeds up typing.

cd Downloads
Get-FileHash WiresharkPortable64_4.0.4.paf.exe -Algorithm SHA256

You will see an output showing the SHA256 checksum. See that it matches the expected value:

If you can't trust your eyes, to verify programatically, run the following. You will see the same results:

cd Downloads
Get-FileHash WiresharkPortable64_4.0.4.paf.exe -Algorithm SHA256 | Where-Object {
    $_.Hash -eq 'D72789CE7CA3715C044AC0913BA0603DF89699EBB6F3839547D64AC1FD9A1518'
}

Congratulations, the downloaded file is verified.

Mac

For Mac, we can use zsh or bash.

Open Terminal, and type the following:

While typing, you can press TAB for autocompletion and listing file, try it. It greatly speeds up typing.

cd Downloads
shasum -a 256 WiresharkPortable64_4.0.4.paf.exe

You will see an output showing the SHA256 checksum. See that it matches the expected value:

If you can't trust your eyes, to verify programatically, run the following. You should see the message WiresharkPortable64_4.0.4.paf.exe: OK:

Note the two spaces between the checksum and the file name.

cd Downloads
echo 'd72789ce7ca3715c044ac0913ba0603df89699ebb6f3839547d64ac1fd9a1518  WiresharkPortable64_4.0.4.paf.exe' | shasum -a 256 -c -

Congratulations, the downloaded file is verified.

Linux

For Linux (e.g. Ubuntu), we can use sh or bash.

Open Terminal, and type the following:

While typing, you can press TAB for autocompletion and listing file, try it. It greatly speeds up typing.

cd Downloads
sha256sum WiresharkPortable64_4.0.4.paf.exe

You will see an output showing the SHA256 checksum. See that it matches the expected value:

If you can't trust your eyes, to verify programatically, run the following. You should see the message WiresharkPortable64_4.0.4.paf.exe: OK:

Note the two spaces between the checksum and the file name.

cd Downloads
echo 'd72789ce7ca3715c044ac0913ba0603df89699ebb6f3839547d64ac1fd9a1518  WiresharkPortable64_4.0.4.paf.exe' | sha256sum -c -

Congratulations, the downloaded file is verified.

Cheat sheet

We now know how verify a file.

Here's a quick cheat sheet you can copy and paste into your notebook.

  1. Download file from website
  2. Find the file's checksum from website
  • Try to look for SHA256 and above. If not, use MD5 or SHA1.
  • If website provides checksum directly, copy it to clipboard, e.g. d72789ce7ca3715c044ac0913ba0603df89699ebb6f3839547d64ac1fd9a1518.
  • If website provides checksum files, e.g. MD5SUMS, SHASUMS, SHA256SUMS, SHA512SUMS, .md5, .sha1 .sha256 .sha512, download it and open it in text editor, look for the downloaded file's checksum, and copy it to clipboard.
  1. Verify checksum
  • QuickHash-GUI:

    • Drag and drop file into QuickHash-GUI
    • In Quickhash-GUI, on the left side under Algorithm, select the algorithm, e.g. MD5, SHA-1, SHA256, or SHA512.
    • Paste the website's provided checksum in the Expected Hash value box to verify.
  • Windows Powershell:

    Get-FileHash .\path\to\file.zip -Algorithm MD5
    Get-FileHash .\path\to\file.zip -Algorithm SHA1
    Get-FileHash .\path\to\file.zip -Algorithm SHA256
    Get-FileHash .\path\to\file.zip -Algorithm SHA512
    
  • MacOS Terminal:

    md5 ./path/to/file.zip
    shasum -a 1 ./path/to/file.zip
    shasum -a 256 ./path/to/file.zip
    shasum -a 512 ./path/to/file.zip
    
  • Linux Terminal:

    md5sum ./path/to/file.zip
    shasum ./path/to/file.zip
    sha256sum ./path/to/file.zip
    sha512sum ./path/to/file.zip
    

Regarding trust

If you followed through, you would have learnt how to verify a file's integrity. This means we know that our file is bit-for-bit the same as the original file on the website or server.

However, verifying a file's integrity (correct data) does not verify it's authenticity (correct sender). In layman's words, just because you receive a exact file copy, doesn't mean you received it from a correct source. It is vulnerable to being a fake (spoofed) website, with a fake (spoofed) file, and we are not protected.

So do we trust the website most of the time? It will take another article to explain things, but in simple terms, if your browser shows a Green Lock icon and the web address shows https:// instead of http://, you can assume data authenticity - the website is belongs to whomever its owner is, because only the owner can get a Certificate for it. The file transfer is encrypted using this Certificate and the file is transferred without modification. Think of Certificates as centralized identity, like your passport is a public identity that we trust because we first trust the passport issuer.

And in case you are interested, there's a mess of GPG and PGP. Think of this as a decentralized identity, that everybody can publish their own identity, and the more people trust an identity, the more likely it can be trusted. This paradigm has not been very successful because it's difficult for layman to know how to even "get started" trusting anyone, compared to the centralized identity where the layman just has to trust the passport issuer.

Final thoughts

It's important to understand how to verify files because, even if one trusts a secure website (e.g. starts with https://) with a Green Lock on the browser, there is still no certainty that the file arrives intact, or in the very worst case, that there's a chance that a trojan on the computer might have modified the downloaded file just before one opens it. To be very sure that a file is exact to the original, always verify its checksum. It is good practice. It's a very simple thing to do.

Advanced users might already know everything in this article. But a ton of normal people don't.