|
Recovery beyond data restore |
|
|
|
|
Written by Hemanshu
|
|
Friday, 22 August 2008 |
Recovery beyond data restore
Quite frequently I see customers looking at recovery as on ability
to restore data from backup which can be far from being enough to
restore the whole system to operating state, especially for complex
systems. Instead of looking just at data restore process you better
look at the whole process which is required to bring system to the
working state, including data consistency requirements and times.
This has to be considered for different data loss scenarios which
may happen.
Let us look at simple example - a master with 1TB of database size
replicating to 50 servers in 5 different Data Centers via single
Replication Relay server in each. Forget the single point of
failure for the second and just think what problems we may have to
deal with. First lets look at the master. What may happen to it ?
We can have Master having soft crash in which case it will be
unavailable for some time but we can get all the data back…
or sort of. In practice you have to be very careful such as using
sync-binlog innodb_flush_logs_at_trx_commit=1 and only using Innodb
tables to be OK in most cases. There are still some edge cases such
as modifying meta data stored in MyISAM tables which can get master
out of sync with slaves in case of soft crash. Unless you got into
one of these rare cases slave should be able to continue after
Master is back online. Do you have to wait for master to recover ?
This is where your data consistency requirements come in play.
Remember replication is asynchronous so whenever you switch to the
slave in case of master failure you may loose transactions. Google
semi-synchronous replication patches can help with it… but
they are not yet in the stock MySQL. Yet another way is using DRBD
to get a standby MySQL server or at least synchronously replicated
master binary logs. If you can’t loose any single transaction
you’ve can’t simply switch to the slave. What if you
can ? The switch to the slave in this case is not very easy too -
all slaves can be on different positions at the master and you need
to pick the most up to date to promote. Plus you need to recompute
positions as they should be on promoted slave and slave should have
–log-slave-updates enabled so it somethat has copy of master
logs. In many cases I’ve seen people do not do that and
simply point slaves to the starting position of the promoted master
- this is dangerous because you’re risking all slaves to be
inconsistent withe each other, plus if server was seriously behind
you’re risking to get major inconsistence because relay logs
will be lost if you just re-point slave. So at least you should
wait for slave to process all its relay logs before re-pointing it.
Interesting enough Google has solution for us again which comes as
“log mirroring” patches which make sure Slaves has copy
of logs as they are on the master. Now what do you do in case of
hard crash this is when the data is lost on the master ? This is
when you have master data lost, such as you have RAID or disk
failure. Though it also can be things like Innodb corruption or
soft crash which you can’t recover promptly enough. In this
case most typically you would plan recovery by switching to the
slave (as described) or standby server via DRBD or SAN.
<mospagebreak> As you can see we never mentioned recovering
from backup so far. It will be needed in the worst case of data
loss which is trashing the data which gets to all slaves via
replication. This can be caused by user or application error or
security breach. What choices do you have in this case ? Your main
options are using Backup or Slave with delayed replication (which
you already could have set up with mk-slave-delay from Maatkit.
Delayed slave is especially helpful if application can operate with
just master as in this case you can switch very quickly (just
skipping bad statements and catching up) The main challenge in such
failure is the fact you have many trashed copies to deal with. If
you have just one or several small tables corrupted you can reload
them. One option is to reload them on the master (and they will be
replicated down to all slaves) the faster however (especially if
you have many tiers of replication) is to bring all slaves to the
same point in time and load data locally with SQL_LOG_BIN=0 set for
session. If the large portion of data trashed you may need to
recover full database on all slaves which is best done in binary
mode for large data sets. Such global recovery can also put very
high stress on your network and backup storage and take a lot of
time. It also may be extremely difficult to get the large backup in
timely fashion over long distance network, meaning it is best to
have local backup (and delayed slave if you use one) in each data
center you have. The complexity of recovery is another
“liability” of compex replication tree setup. On the
contrary sharded master-master pairs (or master with few slaves)
are much easier to deal with. Recovering the data with replication
you always have to keep replication positions in mind. Such if you
recover master you need to recover slaves to matching snapshot -
either it has to be same state (which is hard to manage) or you
need to ensure you understand the position on the master to which
backup corresponds to. This becomes more complex if you have
complex replication hierarchy as slave only knows its position on
its own master not on the “root” master. Note there are
also some solutions based on “Continuous Data
Protection” class of backup which can be very helpful to go
back in time with your data. One of vendors offering solution for
MySQL is R1Soft. Though I have not had a chance to look at it in
details. What is about slave loss ? The slave loss is normally less
of the problem. You can reclone slave from the master, another
slave or restore from backup. So this is just question of having
decent capacity planning (such as being able to shut off 2 slaves
and still operate normally), have LVM setup if you want to avoid
shutting off slave or master to clone the data and making sure the
logs on the master go far back so you can restore from several of
backup generations and do point in time recovery. Timing recovery
is also important. Especially for write intensive environments it
may take many days to catchup from weekly backups by binary logs so
make sure to time it properly. In the real life environments can be
even more complicated - one may use partial replication,
replication to different storage engine, add some tables beyond
tables which are being replicated which all has to be accounted for
for in the process of replication. It is also worth to note beyond
these 3 main recovery scenarios there are number of other cases
which you have to deal with (which often can be resolved by doing
recovery be one of these 3 protocols, but you can also take as
shortcut) - for example you may have master or relay binary log
corruption. Master or Slave running out of space, Slave crashing
(and loosing its position on master), Replication breakage (or
running out of sync) due to MySQL bugs or wrong use. Interesting
enough very few people have their data recovery practices ironed
out so they can answer how they would handle at least these 3 data
loss cases for each of servers they have deployed. Even fewer have
gone beyond theory and have tested the processes or have regular
testing in place.
you can find original articles <a
href:"http://www.mysqlperformanceblog.com/2008/08/02/recovery-beyond-data-restore/">here</a>.
|