Apr 12, 2012

Checksumming


This section explores checksumming. Checksum is an interface that several different classes implement. Let's take a look at what this section covers:
Ø  What is a checksum and how does it work?
Ø  What is a message digest and how does it work?
Ø  Checksum versus message digest


What is checksumming?
         What's a checksum? The Checksum interface defines the methods required to compute a checksum on a set of data. The computation of a checksum is such that if a single bit or two bits in the data are changed, the value for the checksum changes.
A checksum is computed based on the bytes of data supplied to the update() method. The update() method updates the current checksum with an array of bytes by adding it to an ultimate value.
Additionally, the current value of the checksum can be obtained at any time with the getValue() method. Use the reset() method to reset the default value to its initial value, usually 0.
There are two types of checksums. The most common is the CRC32, that's Cycle Redundancy Check. Another type of checksum is the Adler32, used for computing Adler32 checksum. The Adler32 has a faster computation. It's nearly as reliable as the CRC32.


CheckedInputStream  and CheckedOutputStream
          Let's create a Checksum object and use it to filter an input or output stream. When writing a byte to the CheckedOutputStream , the checksum is automatically computed.
CheckedOutputStream calls  the update() method in the Checksum object. Then the byte goes out to its ultimate destination. At any point in time, you can ask the Checksum object for the current checksum. You can write all your bytes out and add that checksum to what you have written out.
           * CheckedInputStream  implements FilterInputStream and maintains a checksum on every read.
           * CheckedOutputStream implements FilterOutputStream and maintains a checksum on every write.


Adding checksum
          Use a checksum to make sure that data was transmitted properly.Construct a CheckedOutputStream using an OutputStream and an object of the Checksum type.CRC32 and Adler32 are the two types of checksums you can choose from or you can create your own. At periodic times, say every 512 bytes, retrieve the current value of the checksum and send those four bytes over the stream.
On the receiving end, you construct a CheckedInputStream  using an InputStream and an object of the same type you used for the CheckedOutputStream .After reading 512 bytes, retrieve the current value of the checksum and compare it to the next four bytes read from the stream.

CheckedOutputStream: Example code
          Let's look at some sample code. We are going to open a FileOutputStream ,Temp1.tmp.An object of the CRC32 type is constructed, and that is used to create a CheckedOutputStream. When writing to the file output a byte at a time, the checksum is computed. At the end, the Checksum object is then asked the current value of the checksum.
Checksumming only gives you a 32-bit value. If a single bit changes, you will notice it, but if lots of bits change, you could possibly get the same checksum.

Here is the example code:

FileOutputputStream os =new FileOutputStream("Temp1.tmp");
CRC32 crc32 = new CRC32();
CheckedOutputStream cos =new CheckedOutputStream(os, crc32);
cos.write(1);
cos.write(2);
long crc = crc32.getValue();


Digesting
          Let's create something larger than a Checksum.A larger set of bits that represents a sequence of bytes is a message digest. A message digest works like a checksum. It is a one-way hash that takes a variable amount of data and creates a fixed-length hash value. This is called a digital fingerprint.
If somebody makes changes in the original sequence of things and manipulates the contents, the message digest is not going to look the same. This is how we implement security. There is a possibility, but it is very small, that if the message were changed, you would get the same message digest.
A sequence of bytes is transformed into a digest.In the MessageDigest class, we have a static function called getInstance. getInstance is  given the name of the MessageDigest that we want to be using. There are two Strings that you can pass it: SHA1 and MD5. Their details are in algorithmic books.
Additionally, the update() methods update the digest with the byte or array of bytes. The digest() method computes and returns the value of digest. Digest is then reset.


DigestInputStream and DigestOutputStream
           We have two equivalent methods for digest: the DigestInputStream and DigestOutputStream. Give the DigestInputStream an input stream and the MessageDigest that will be doing its calculations. You can turn digesting on or off for a stream. You may want to input or output some data, but not calculate it as part of the MessageDigest.
The read() and write() methods update the MessageDigest in DigestInputStream and DigestOutputStream respectively.

MessageDigest: Example code
          The first line creates a FileOutputStream going to the file temp.tmp. Create a MessageDigest object. As you can see in the code md =MessageDigest.getInstance() , you're passing it the name of the message digesting algorithm that you want to use. It returns back to you an object of a MessageDigest type. If the String that you passed it is not the name of a digest, it throws a NoSuchAlgorithmException
After you have the MessageDigest , use that to construct a DigestOutputStream . Bytes are written out to the underlying OutputStream and added to the digest. You can get MessageDigest  by calling md.digest() . It returns the digest or an array of bytes. Now you can store that away or send it along.

Here is the example code:

FileOutputStream os = new FileOutputStream("Temp.tmp");
MessageDigest md = null;
try
{
md = MessageDigest.getInstance("SHA");
}
catch(NoSuchAlgorithmException e)
{
System.out.println(e);
}
DigestOutputStream dos = new DigestOutputStream(os, md);
dos.write(1);
dos.write(2);
byte [] digest = md.digest();


Checksum versus MessageDigest
          Remember, a checksum is small. A checksum is typically used to verify the integrity of data over a noisy transmission line or to verify whether a file contains the same data. A message digest, on the other hand, is much larger. It is used more for security to insure that a message has not been tampered with.



0 comments :

Post a Comment