Fixed Input vs. Variable Input Compression
As a number of you know, I have been blogging about the merits of Real-time Compression. It may be of some interest to know that when Ed Walsh, CEO of Storwize, asked me to join and told me the company focused on "compression", I first thought he was joking. I mean the industry has had compression available for years. The reality is, there is no other technology like Real-time Compression available from any vendor, and it is today, even more clear, why IBM chose to own this technology. In the next few blog pieces I plan to talk about a few of the concepts of the IP that make this technology so far advanced than any / all of its competition. Today’s piece is about fixed input versus variable input compression. This is a very simple concept to understand really. Traditional compression uses a process called 'fixed input' / 'variable output'. If we refer to the diagrams below, we start off with the original file on the left and the compressed file on the right. The way traditional compression works (and you can actually watch this on your home computer if you winzip a file) is the following: The compression algorithm will 'chunk up' the original file into 'manageable' sizes before it compresses the file. The tradeoff here, and why this process happens, is like with anything in computer science, performance for optimization. The first diagram shows the large file being 'chunked up', compressed and stuffed into the smaller file.
There are two significant issues with this. The first issue is that the compression dictionary is not shared across multiple ‘chunks’ when compression is taking place. The example in Figure 2 shows that the letter “F” in ‘chunk’ 1 does not get compressed with the letter “F” in ‘chunk’ 4. This means that the compression ratio is simply not optimized across the file.









