Berkeley DB Reference Guide:
Berkeley DB Replication



Berkeley DB never initiates elections, that is the responsibility of the application. It is not dangerous to hold an election, as the Berkeley DB election process ensures there is never more than a single master environment. Clients should initiate an election whenever they lose contact with the master environment, whenever they see a return of DB_REP_HOLDELECTION from the DB_ENV->rep_process_message method, or when, for whatever reason, they do not know who the master is. It is not necessary for applications to immediately hold elections when they start, as any existing master will be quickly discovered after calling DB_ENV->rep_start. If no master has been found after a short wait period, then the application should call for an election.

For a client to become the master, the client must win an election. To win an election, the replication group must currently have no master, the client must have the highest priority of the database environments participating in the election, and at least (N / 2 + 1) of the members of the replication group must participate in the election. In the case of multiple database environments with equal priorities, the environment with the most recent log records will win.

It is dangerous to configure more than one master environment using the DB_ENV->rep_start method, and applications should be careful not to do so. Applications should only configure themselves as the master environment if they are the only possible master, or if they have won an election. An application can only know it has won an election if the DB_ENV->rep_elect method returns success and the local database environment's ID as the new master environment ID, or if the DB_ENV->rep_process_message method returns DB_REP_NEWMASTER and the local database environment's ID as the new master environment ID.

To add a database environment to the replication group with the intent of it becoming the master, first add it as a client. Since it may be out-of-date with respect to the current master, allow it to update itself from the current master. Then, shut the current master down. Presumably, the added client will win the subsequent election. If the client does not win the election, it is likely that it was not given sufficient time to update itself with respect to the current master.

If a client is unable to find a master or win an election, it means that the network has been partitioned and there are not enough environments participating in the election for one of the participants to win. In this case, the application should repeatedly call DB_ENV->rep_start and DB_ENV->rep_elect, alternating between attempting to discover an existing master, and holding an election to declare a new one. In desperate circumstances, an application could simply declare itself the master by calling DB_ENV->rep_start, or by reducing the number of participants required to win an election until the election is won. Neither of these solutions is recommended: in the case of a network partition, either of these choices can result in there being two masters in one replication group, and the databases in the environment might irretrievably diverge as they are modified in different ways by the masters.

It is possible for a less-preferred database environment to win an election if a number of systems crash at the same time. Because an election winner is declared as soon as enough environments participate in the election, the environment on a slow booting but well-connected machine might lose to an environment on a badly connected but faster booting machine. In the case of a number of environments crashing at the same time (for example, a set of replicated servers in a single machine room), applications should bring the database environments on line as clients initially (which will allow them to process read queries immediately), and then hold an election after sufficient time has passed for the slower booting machines to catch up.

If, for any reason, a less-preferred database environment becomes the master, it is possible to switch masters in a replicated environment, although it is not a simple operation. For example, the preferred master crashes, and one of the replication group clients becomes the group master. In order to restore the preferred master to master status, take the following steps:

  1. The preferred master should reboot and re-join the replication group as a client.
  2. Once the preferred master has caught up with the replication group, the application on the current master should complete all active transactions, close all open database handles, and reconfigure itself as a client using the DB_ENV->rep_start method.
  3. Then, the current or preferred master should call for an election using the DB_ENV->rep_elect method.


Copyright Sleepycat Software