Hello! Today I will be benchmarking the performance of various popular compression algorithms, and sharing the conditions and results of that. The algorithms under examination today include GZip, BZip2, LZMA/2, LZ4, and LZO.
These files were selected:
Size: 579,962,880 bytes (~553.1 MiB)
Size: 49,678,811 bytes (~47.4 MiB)
These machines were used:
Model: Online.net Dedibox LT 2017
CPU: Intel Xeon E3-1240 v6 @ 3.7GHz (4.1GHz), quad-core HT CPU, 72W TDP
System: CentOS 3.10.0-693.17.1.el7.x86_64
RAM: 32GiB DDR4 ECC
These program versions were fetched and compiled for use on the above systems:
Size: 2,590,720 bytes (~2.47 MiB)
Size: 5,253,120 bytes (~5.00 MiB)
Size: 1,361,920 bytes (~1.30 MiB)
Size: 5,654,480 (~5.39 MiB)
Size: 3,328,000 bytes (~3.17 MiB)
These were the settings used to compile all of the above programs:
CC=clang CFLAGS+=-O2 -pipe All symbols stripped (strip -s)
For testing, two shell scripts were built to track and record the performance of the algorithms, available as a Gist. First is
compressbench.sh, which does most of the heavy lifting. The other script,
iterbench.sh, wraps the first script and iterates over it 6 times, keeping successive results - this is to average the output of some algorithms which muddle in the margin of error at certain (usually lighter) compression levels.
The raw CSV data for these results can be viewed here. The
algo column lists the program in use;
xze is the same as
xz except with the
-e flag passed as well.
dtime are measured in seconds.
dmem are measured in kibibytes (multiples of 1024 bytes).
size is measured in bytes, which can be compared to original file sizes shown above.
The results here are pretty interesting. Here are some quick facts:
From what the data shows, LZMA/2 is still the reigning champion for raw performance. I can understand why BZip2 has fallen out of favour judging from its resource usage and inferiority to LZMA/2, and yet GZip has stuck around, if not for legacy reasons than its speed and consistency.
LZO and LZ4 are very interesting for how much lazier their performance is in favour of speed. It seems that such algorithms may easily be I/O limited, even on solid state media, which is a lovely space-saving proposition that not only will not cost extra time, but will save even more time the more aggressive the compressor was in packing the data. In systems that use large amounts of text or other assets, it would be a no-brainer to use.
Until next time,