





















































SBR is based on a simple but effective principle: if two or more machines have the same set of data to begin with, they will remain identical if all of them execute the exact same SQL statements in the same order.
Executing all statements manually on multiple machines would be extremely tedious and impractical. SBR automates this process. In simple terms, it takes care of sending all the SQL statements that change data on one server (the master) to any number of additional instances (the slaves) over the network.
The slaves receiving this stream of modification statements execute them automatically, thereby effectively reproducing the changes the master machine made to its data originally. That way they will keep their local data files in sync with the master's.
One thing worth noting here is that the network connection between the master and its slave(s) need not be permanent. In case the link between a slave and its master fails, the slave will remember up to which point it had read the data last time and will continue from there once the network becomes available again.
In order to minimize the dependency on the network link, the slaves will retrieve the binary logs (binlogs) from the master as quickly as they can, storing them on their local disk in files called relay logs. This way, the connection, which might be some sort of dial-up link, can be terminated much sooner while executing the statements from the local relay-log asynchronously. The relay log is just a copy of the master's binlog.
The following image shows the overall architecture:
In the image you can see that each slave may have its individual configuration on whether it executes all the statements coming in from the master, or just a selection of those. This can be helpful when you have some slaves dedicated to special tasks, where they might not need all the information from the master.
All of the binary logs have to be sent to each slave, even though it might then decide to throw away most of them. Depending on the size of the binlogs, the number of slaves and the bandwidth of the connections in between, this can be a heavy burden on the network, especially if you are replicating via wide area networks.
Even though the general idea of transferring SQL statements over the wire is rather simple, there are lots of things that can go wrong, especially because MySQL offers some configuration options that are quite counter-intuitive and lead to hard-to-find problems.
For us, this has become a best practice:
"Only use qualified statements and replicate-*-table configuration options for intuitively predictable replication!"
What this means is that the only filtering rules that produce intuitive results are those based on the replicate-do-table and replicate-ignore-table configuration options. This includes those variants with wildcards, but specifically excludes the all-database options like replicate-do-db and replicate-ignore-db. These directives are applied on the slave side on all incoming relay logs.
The master-side binlog-do-* and binlog-ignore-* configuration directives influence which statements are sent to the binlog and which are not. We strongly recommend against using them, because apart from hard-to-predict results they will make the binlogs undesirable for server backup and restore. They are often of limited use anyway as they do not allow individual configurations per slave but apply to all of them.
In this recipe, we will show you how to prepare a dump file of a MySQL master server and use it to set up one or more replication slaves. These will automatically be updated with changes made on the master server over the network.
You will need a running MySQL master database server that will act as the replication master and at least one more server to act as a replication slave. This needs to be a separate MySQL instance with its own data directory and configuration. It can reside on the same machine if you just want to try this out. In practice, a second machine is recommended because this technique's very goal is to distribute data across multiple pieces of hardware, not place an even higher burden on a single one.
For production systems you should pick a time to do this when there is a lighter load on the master machine, often during the night when there are less users accessing the system. Taking the SQL dump uses some extra resources, but unless your server is maxed out already, the performance impact usually is not a serious problem. Exactly how long the dump will take depends mostly on the amount of data and speed of the I/O subsystem.
You will need an administrative operating system account on the master and the slave servers to edit the MySQL server configuration files on both of them. Moreover, an administrative MySQL database user is required to set up replication.
We will just replicate a single database called sakila in this example.
Replicating more than one database
In case you want to replicate more than one schema, just add their names to the commands shown below. To replicate all of them, just leave out any database name from the command line.
server-id=1000
log-bin=master-bin
If one or both entries already exist, do not change them but simply note their values. The log-bin setting need not have a value, but can stand alone as well.
master> grant replication slave on *.* to 'repl'@'%' identified by
'slavepass';
$ mysqldump -uUSER -pPASS --master-data --single-transaction
sakila > sakila_master.sql
server-id=1001
replicate-wild-do-table=sakila.%
When adding more than one slave, make sure the server-id setting is unique among master and all clients.
slave> create database sakila;
slave> use sakila;
slave> source /tmp/sakila_master.sql;
slave> CHANGE MASTER TO master_host='master.example.com',
master_port=3306, master_ user='repl',
master_password='slavepass';
slave> START SLAVE;
slave> SHOW SLAVE STATUSG
************************** 1. row ***************************
...
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Some of the instructions discussed in the previous section are to make sure that both master and slave are configured with different server-id settings. This is of paramount importance for a successful replication setup. If you fail to provide unique server-id values to all your server instances, you might see strange replication errors that are hard to debug.
Moreover, the master must be configured to write binlogs—a record of all statements manipulating data (this is what the slaves will receive).
Before taking a full content dump of the sakila demo database, we create a user account for the slaves to use. This needs the REPLICATION SLAVE privilege.
Then a data dump is created with the mysqldump command line tool. Notice the provided parameters --master-data and --single-transaction. The former is needed to have mysqldump include information about the precise moment the dump was created in the resulting output. The latter parameter is important when using InnoDB tables, because only then will the dump be created based on a transactional snapshot of the data. Without it, statements changing data while the tool was running could lead to an inconsistent dump.
The output of the command is redirected to the /tmp/sakila_master.sql file. As the sakila database is not very big, you should not see any problems. However, if you apply this recipe to larger databases, make sure you send the data to a volume with sufficient free disk space—the SQL dump can become quite large. To save space here, you may optionally pipe the output through gzip or bzip2 at the cost of a higher CPU load on both the master and the slaves, because they will need to unpack the dump before they can load it, of course.
If you open the uncompressed dump file with an editor, you will see a line with a CHANGE MASTER TO statement. This is what --master-data is for. Once the file is imported on a slave, it will know at which point in time (well, rather at which binlog position) this dump was taken. Everything that happened on the master after that needs to be replicated.
Finally, we configure that slave to use the credentials set up on the master before to connect and then start the replication. Notice that the CHANGE MASTER TO statement used for that does not include the information about the log positions or file names because that was already taken from the dump file just read in.
From here on the slave will go ahead and record all SQL statements sent from the master, store them in its relay logs, and then execute them against the local data set.
This recipe is very important because the following recipes are based on this! So in case you have not fully understood the above steps yet, we recommend you go through them again, before trying out more complicated setups.