During a recent audit, I ran into something interesting while reviewing a script as part of a control related to data integrity. The script performed a simple ETL function (Extract Transform & Load) on tables of data sent and retrieved over a secure FTP connection from their customer’s server.
As I wallowed in geek heaven, deconstructing the code and the intricacies of their ETL process, I ran into a really groovy algorithm. I asked the author of the script what the algorithm was for and he said it performed a Cyclical Redundancy Check (CRC).
I remembered reading up on CRC checks while studying for the CISA, but had never encountered one in the wild. I figured now is a better time than any to dive in and learn more!
What is a Cyclical Redundancy Check?
A CRC is a type of Checksum. A Checksum is a block of digital data used for detecting errors or corruption of raw data or files for both transmission of data and for checking data at rest on hard drives. The CRC is typically sent alongside the data as a “crc.list” file.
Checksums are especially handy for verifying that data sent over noisy connections are free of errors that might be introduced due to network interference, line noise or distortion.
A Cyclical Redundancy Check is special in that it more accurately detects errors introduced in data than a regular Checksum, which uses parity to check for data integrity. CRCs achieve this by using polynomial long division, with the remainder from the equation equaling the CRC (or checksum value).
Going into the math used to generate a CRC is beyond the scope of this post, but this graphic does a good job of illustrating the process as I have understood it. Please help me improve if you can, I am not a mathematician!
Considerations for the Auditor
Now we get to the important part. Why do we, as auditors and managers care? It is important to understand things like CRCs and checksums in general to understand when they should and should not be used within certain processes.
When Use of CRC and various other Checksums is Appropriate
Checksums should be used when needing to verify the integrity of data, particularly during data transmission from one source to another. For example:
- Verifying the integrity of software installation files downloaded over the internet.
- Verifying the integrity of data sent and received electronically (i.e. over FTP)
- Checking for errors in internet traffic, used as part of network programming.
When Use of CRC and various other Checksums in Not Appropriate
Checksums should not be used to verify the authenticity of the data or the sender, or to produce a secure, cryptographic hash.
The hash value created by the various checksum algorithms is meant to be reversible by design, meaning these values should never be used to mask data.
Read more about CRC on Wikipedia. Of all the research I did, the Wiki article was by far the most comprehensive and easiest to digest. And check out this other cool Wiki article for a comparison of file verification applications that can help you out with checksums!
See CRC in action for yourself using the CRC-32 Online Checksum Calculator!