Introducing Flipper for managing MySQL master pairs
At Proven Scaling, we’re great fans of using pairs of MySQL servers replicating to each other (commonly known as master-master replication or dual-master replication) as a way of ensuring high availability for MySQL databases.
Deploying servers in this way enables one half of the pair to be taken offline for maintenance work while the other half carries on dealing with queries from clients — meaning that, for instance, lengthy ALTER TABLE operations can be done with no impact on service. This strategy has been in use at many sites for years, and has been very successful at minimizing downtime.
The usual way of implementing this model is to have IP addresses floating between the two MySQL servers. Rather than having the clients use the actual IP addresses or hostnames of the servers themselves, these “floating IPs” (or “virtual IPs”, “VIPs”, “IP aliases”) are used by clients to access MySQL based on a role (typically “writable” and “read-only”). The floating IP addresses can be moved between the servers as required to ensure that each role is always available.
There are some tools already available to manage pairs of MySQL servers, most notably mysql-master-master (MMM) and the High Availability Linux project.
Today, we’re announcing the release of Flipper, a tool for managing access to MySQL servers using master-master replication.
Reinventing the wheel?
Although the existing tools have worked well for some people in some situations, we (and our customers) have been frustrated by the number of situations where they’re not suitable.
Most of the existing tools are specific to Linux, and therefore no good to users of Solaris, FreeBSD and other operating systems. Most are heavy-weight implementations, with monitoring daemons running all the time. Configuration is rarely simple, sometimes because some of the available solutions try to be all things to all men, doing things that would be better handled elsewhere.
A lot of the effort that’s been put into other tools has been aimed at implementing automatic failover. Sometimes this can be very useful (for instance in stateless applications, restartable services, etc.), but very often this is implemented with little consideration for the possible consequences in a stateful, database environment.
Bringing failed servers back into service prematurely (as often happens with hardware load-balancing solutions) can be disastrous, with bad data being returned to clients, or data received from clients and theoretically committed being lost. Likewise, servers may be incorrectly diagnosed as having failed, causing a painful, lengthy, and potentially irreversible failover process to take place for what should have been a barely noticeable event. In some cases, an automatic failover system may change its mind back and forth, causing repeated failover events (known as “flapping”). All in all, we’re not convinced that completely automatic failover is always a good idea1.
Automated, but manually triggered
Flipper’s design comes from a very pragmatic perspective. It’s a standalone tool that doesn’t require constantly running monitoring daemons — it evaluates the current situation at the moment that it’s executed, and does only what it’s told. It doesn’t attempt to do anything fancy right now; it just manages moving IP addresses between MySQL nodes and reconfiguring a typical master-master setup, in a safe, controlled manner. If one of the MySQL masters fails, it will allow you to move services away from the failed master, enabling you to fix the failure.
Flipper has been designed to be as portable as possible. It’s capable of running on almost any UNIX-like operating system, as it’s written in Perl and uses DBD::mysql to communicate with MySQL servers. Flipper itself doesn’t necessarily require any special privileges, user accounts, or daemons; it uses ssh and sudo to run system commands (and you’d typically want to set up SSH keys, and use ssh-agent to avoid typing your passphrase so many times).
We will add additional features in the future, but the system will always remain modular — you’ll be able to use whichever parts of it you want.
Where can I find out more?
We’ve set up a new (and currently rather minimal) micro-site for Flipper at provenscaling.com/software/flipper with documentation and links to various resources.
We will also post on this blog when there’s a new release, or some other important Flipper-related news.
1 Of course, we’d be delighted to hear from anyone who wants to try and convince us that any of the current MySQL automatic failover/HA strategies are error-proof, or anyone who’s got new ideas about how this can be achieved.

I will definitely check this out. My company is currently preparing for a move to a master-master setup, and working on picking a solution for swapping between the two. I am fundamentally wary of anything that is too automatic on this front.
This is a great first release and we plan to evaluate it further. I really love the simplistic approach (thought the configuration could be easier :)).
I have a question thought:
In which ways is it possible to do automatic fail over (for example, in a situation where the active master server crashes). I think it would be very unfortunate to end up in a situation where a server crashes and there’s nobody to do the manual recovery.
Kind regards,
amix
Why not perform slave promotion?
What we do is run an ALTER on the slave, promote the slave to a master, and then take the old master, and perform the ALTER on it…
This way one machine is always in production.
We actually have three replicas now so it’s slightly more complicated but isomorphic to what I just described.
It also requires no custom code or IP setup……
One downside is that you need to hold a cluster level mutex while you’re doing the promotion. We’re just putting the master in single user mode while this happens which is pretty much the same thing.
Kevin
Kevin,
It is essentially slave promotion only without mucking with the replication settings. By using a floating IP one machine is effectively always in production (it’s read only for a fraction of a second while the ip is moved). This lets us run alter table on a slave, failover the cluster and run alter table on the other slave without interrupting clients.
The idea behind the floating ip it hat there is no client configuration change to handle failover. The client simply sees a mysql instance at the floating ip that is occasionally read only for a fraction of a second.
-Eric
This is a great technology and we have deployed it in a couple of production instances. I want to understand how the flipping happens for writes (master). I know it puts a read_only lock on the 1st master , then syncs the second master to the 1st, removes the read_only lock from 2nd master, then points the Virtual IP to the 2nd Master. But my question is what happens to existing connections to the 1st master. If the existing connections tries to write, it will give an error , but if it tries to read, then it will have stale data (if we start alter table command). How do we guarantee that existing connections to the old master is either kicked out or switched over.
Thanks,
Salim