How would you compress your MySQL Backup
Artice taken from <a
href="http://www.mysqlperformanceblog.com/2008/06/05/how-would-you-compress-your-mysql-backup/">
Here </a>.
Backing up MySQL Database most people compress them - which can
make a good sense in terms of backup and recovery speed as well as
space needed or be a serious bottleneck depending on circumstances
and approach used. First I should mention this question mainly
arises for medium and large size databases - for databases below
100GB in size compression performance is usually not the problem
(though backup impact on server performance may well be).
We also assume backup is done on physical
level here (cold backup, slave backup, innodb hot backup or
snapshot backup) as this is only way practical at this point for
databases of decent size. Two important compression questions you
need to decide for backup is where to do compression (on the source
or target server if you backup over network) and which compression
software to use.
Compression on source server is most
typical approach and it is great, though it takes extra CPU
resources on the source server in additional to IO resources which
may not be available, especially for CPU bound MySQL Load. The
benefit in this case is less space requirement if you’re
keeping the local copy as well as less network bandwidth
requirements in case you’re backing up to network
storage.
Compression on the destination server
offloads source server (though it may run our of CPU itself, if it
is target for multiple backups, plus there are higher network
bandwidth requirements to transfer uncompressed backup. What is
about compression tool ? The classical tool used for backup
compression is gzip - it exists almost everywhere, it is stable and
relatively fast.
In many cases however it is not fast
enough and becomes the bottleneck for all the backup process.
Recently I did a little benchmark
compressing 1GB binlog file with GZIP (compression done from OS
cache and redirected to /dev/null so we only measure compression
speed). On the test box with Intel(R) Core(TM)2 Duo CPU E4500 @
2.20GHz CPU. GZIP would compress this file in 48 seconds (with
default options) resulting in 260MB compressed file. This gives us
compression speed of about 21MB/sec - clearly much less than even
single SATA hard drive can read sequentially. This file when will
take about 10 seconds to decompress, meaning source file will be
read at 26MB/sec to do decompression - this is again much less than
hard drive sequential read performance, though the fact this gives
us about 100MB/sec of uncompressed data writing is more of the
issue.
Such performance also means if your goal
is faster local network transfer default GZIP compression will not
speed things up on the standard point to point 1Gbit network
connection. If we try gzip -1 to get fastest compression we get the
same file compressed to 320MB in 27 seconds. This gives us 37MB/sec
which is a lot better but still not quite enough. Also note the
serious leap in compressed file size. Though in this example we
used MySQL binary log file which often contains plenty of similar
events, which could be the reason for so large size difference
based on compression ratio. The decompression takes about same 10
seconds which gives about 32MB/sec of archive read speed and same
100MB/sec of uncompressed data.
<mospagebreak>
Do we have any faster alternatives to GZIP
? There are actually quite a few but I like LZO which I was playing
with since later 1990’s and which is rather active project.
There is also GZIP like command like compressor using LZO library
called LZOP which makes it easy drop in replacement. I got LZOP
binary which was built against LZO 1.0, more resent version 2.0
promises further performance improvements especially on 64bit
systems.
With LZO default compression file
compressed in 10.5 seconds and resulted in 390MB compressed file,
this gives us 97MB/sec compression speed which is good enough to
compress all data you can read from single drive. The file
decompresses in 3.7 seconds which gives 105MB/sec read speed from
archive media and 276MB/sec write speed to the hard drive - this
means restoring from backup compressed with LZO will often be as
fast or faster as from not compressed one.
With LZO there is also “-1″
option for even faster compression which had rather interesting
results. The file compressed in 10.0 seconds (102MB/sec) and was
385MB in size - so this lower compression rate actually compressed
this a bit better while being about 5% faster. The decompression
speed was about the same. I’m sure the results may change
based on the data being compressed but it looks like LZO uses
relatively fast compression by default already.
With real server grade CPU deployment the
performance should be even better, meaning you should get over
+-100MB/second you can pass through 1Gbit ethernet, meaning you
actually can use LZO compression for faster data transfer between
the boxes (ie together with netcat)
Now as in my benchmarks there is also
overhead of reading (from file cache) and piping to the /dev/null
which are constant the true difference in compression speed is even
larger, though as most of backup operations will need reading and
writing anyway they come with this static overhead naturally
added.
UPDATE: It looks like people are wondering
how BZIP2 compares so I should check it before I delete this
particular file. BZIP compression for this file took 298 seconds
which is just 3.4 MB/sec though compressed file was just 174MB in
size. Decompression took 78 sec which means compressed data was
read at 2.2MB/sec and result was generated with 13 MB/sec.
For all archivers it is possible to use
parallel compression to get better speed though this also means a
higher load which can be the issue if you’re not using
dedicated server for backups.
I should also note for mysqldump backup
typically tools with better and slower compression make sense
because it takes longer to dump and much longer to load to the
database anyway so overral compression impact is less than for
physical level backup.
|