| Author: | Dimitri Fontaine |
|---|---|
| Version: | 0.2.2007-07-05 |
Abstract
Having to setup some replication solution on top of some production PostgreSQL database, I've choosen londiste from SkyTools.
First, you may want to know what I needed replication for, then why I wanted to go with londiste. Here's the first part of the story, describing the project and why I do need some sort of replication, then how to choose a solution and getting at last to the true real story, londiste first steps.
At work we've got this venerable solution (read old code hard enough to maintain for managers to decide for a partial rewrite). Part of it is about money transactions, part of it is a web application allowing our users to manage their accounts, see statistics, get their money, spend some of it on our services, etc.
So we thought about separating those two parts for real, from the sources to the machines involved. Involving the database server(s), that is compartmentalize the setup.
Here's the need: a part of the database will be on a first database server, the other part only on the second server. So we need a way to have the money tables on the former to appear on the latter, where to maintain statistics and allow our users to do their important management. And we need some of the web edited informations on the latter database to appear on the former, as that's the way it works.
We clearly have master/slave behavior here, data being accessed either in read write mode or in read only mode, depending on the server from where you access it.
It's about having data edited on a server and available locally on the other, in read only mode, but without need for the guarantee that commited transaction on server A has made it to server B, so asynchronous master slave mode of operation.
The big obvious choice would be Slony, but it's known to be uneasy to setup and maintain. The newcomer on the field, which took Slony as its specifications, is called londiste, from the SkyTools project.
This project is known to be easy to setup, at least compared to Slony-I, but to severely lack of documentation. Having so much read and heard about Slony-I high learning curve, though, I wanted to test londiste first.
The idea being that if londiste makes the job, no need to take on Slony-I yet.
The software is available on pgfoundry and easy enough to install on a debian system, as it supports it from the source code, using the yada debian package making facility. So it summaries as:
$ apt-get install postgresql-server-dev-8.2 python-psycopg $ apt-get install libevent-dev yada devscripts $ make deb82 $ dpkg -i skytools_2.1.3_i386.deb skytools-modules-8.2_2.1.3_i386.deb $ londiste.py -h
On a RedHat system (CentOS 4), I had to manually install python-egenix-mxdatetime to be able to have python-psycopg importable and run londiste.
The skytools projects really lacks documentation, but comes with a test suite, and a mailing-list which archives revealed soon to be a must-read!
Now, here's the quick howto for the impatient, from things read in the archives, in the documentation, in the source code, and learnt from errors:
- for londiste to run, you need a running pgq ticker;
- ticker has to be installed and run on the provider site;
- run londiste.py from the subscriber, even the provider installation step;
- install both pgq and londiste as a PostgreSQL superuser;
- then alter owner to the user writing into the replicated tables to the londiste schema and its tables, or GRANT your users the privileges to write into londiste.* schema and tables;
- create yourself the table to be replicated on the subscriber site;
- subscriber data are not enforced to be read-only, take care of it yourself;
- londiste replicates schema.table to schema.table.
Then some constraints to respect for being able to play with the tool, some of them requires reading the source (thanksfully written in python) to understand:
tables to replicate have to have a primary key defined;
on the subscriber site, don't define any foreign key constraint, this will only give you the following message:
ERROR found trigger on table xxx (143480)where xxx is the table name and the magic number its system oid;
database encoding on provider and subscriber must be the same (for copy to work sanely whatever the database content);
The configuration of both londiste and pgq being straightforward, won't be cover in this document. We'll assume you've done it by now, and we're ready to replicate asynchronoulsy some tables!
On the provider site, first run the ticker:
$ pgqadm.py -d ticker.ini ticker
Then, on the subscriber site, install londiste:
$ londiste.py conf.ini provider install $ londiste.py conf.ini subscriber install
Those two commands are to be run being connected as a PostgreSQL superuser. As the subscriber data will be read-only—remember that you have to enforce this yourself—, having data owned by a superuser seems acceptable. After installing londiste on provider, it's possible to alter ownership of londiste schema and its tables—or to simply GRANT permission on them—to a non-superuser user, the one who does the data updates for example.
When you're positive about the data users to be granted the privileges to use londiste.* tables (from triggers which insert, update and delete internal data), we can run londiste:
$ londiste.py conf.ini provider add foo bar baz $ londiste.py conf.ini subscriber add foo bar baz $ londiste.py -d conf.ini replay $ tail -f londiste.conf.log
And there it is, the replication is ongoing, beginning with some splitted COPY-ing of the data from the provider to the subscriber.
Of course, you can add tables to provider and subscriber after having launched the replication manager—the replay command—, you just need to have added a table to the provider before to add it to the subscriber.
Some easy to trap errors:
after adding a table to subscriber, provider site can't insert/update/delete into it
The user issuing the write-queries hasn't been GRANT-ed permissions to write into londiste.* tables.
WARNING No pidfile, process not running
You issued some add or setup command and canceled it, using kill or a keyboard interrupt, and now londiste seems to be lost.
First, it certainly is still running, check logs and main .pid file.
Now, check that the pid given into a %(name)s.copy.pid or %(name)s.setup.pid is no more running on your system, then remove the file.
After having made different installations, including 3 weeks of often disconnected provider/subscriber setup, and production provider to testing system subscriber where PostgreSQL is sometimes restarted, I have confidence into the software.
The production server where londiste has been installed showed no impact on performance and on resources usage—basic top and dstat like measuring, and slow queries monitoring—, expect for when adding or removing tables to replication: it create or drop a trigger, which requires some locking.
Some more testing will have to come, though, performance stress tests using Tsung load testing tool and data validity checking. It would also be neat to have some information about replication latency—maybe pgqadm.py ticker status is simply this.
Sparse (or inexistant) documentation, not precise enough error messages, some source reading and guess work... but the commands are easy to work with, and I feel like it was worth the effort.
Next steps with londiste includes:
- having some monitoring of it, that implies understanding its logs—{count, ignored, duration}—and writing the associated munin plugin;
- some more testing,
- some more documentation writing.
Thanks Skype for open-sourcing PostgreSQL related software!