Today, we are
releasing MoSQL, a
tool Stripe developed for live-replicating data from a MongoDB
database into a PostgreSQL database. With MoSQL, you can run
applications against a MongoDB database, but also maintain a
live-updated mirror of your data in PostgreSQL, ready for querying
with the full power of SQL.
Motivation
Here at Stripe, we use a number of different database technologies
for both internal- and external-facing services. Over time, we've
found ourselves with growing amounts of data in
MongoDB that we would like
to be able to analyze using SQL. MongoDB is great for a lot of
reasons, but it's hard to beat SQL for easy ad-hoc data
aggregation and analysis, especially since virtually every
developer or analyst already knows it.
An obvious solution is to periodically dump your MongoDB
database and re-import into PostgreSQL, perhaps
using mongoexport. We
experimented with this approach, but found ourselves frustrated
with the ever-growing time it took to do a full refresh. Even if
most of your analyses can tolerate a day or two of delay,
occasionally you want to ask ad-hoc questions about "what happened
last night?", and it's frustrating to have to wait on a huge
dump/load refresh to do that. In response, we built MoSQL,
enabling us to keep a real-time SQL mirror of our Mongo data.
MoSQL does an initial import of your MongoDB collections into a
PostgreSQL database, and then continues running, applying any
changes to the MongoDB server in near-real-time to the PostgreSQL
mirror. The replication works by tailing the
MongoDB oplog,
in essentially the same way Mongo's
own replication
works.
Usage
MoSQL can be installed like any other gem:
$ gem install mosql
To use MoSQL, you'll need to create
a collection
map which maps your MongoDB objects to a SQL schema. We'll
use the collection from
the MongoDB
tutorial as an example. A possible collection map for that
collection would look like:
mydb:
things:
:columns:
- _id: TEXT
- x: INTEGER
- j: INTEGER
:meta:
:table: things
:extra_props: true
Save that file as collections.yaml, start a local mongod
and postgres, and run:
$ mosql --collections collections.yaml
Now, run through
the MongoDB
tutorial, and then open a psql shell. You'll find all
your Mongo data now available in SQL form:
postgres=# select * from things limit 5;
_id
| x | j | _extra_props
--------------------------+---+---+------------------
50f445b65c46a32ca8c84a5d | | | {"name":"mongo"}
50f445df5c46a32ca8c84a5e | 3 | | {}
50f445e75c46a32ca8c84a5f | 4 | 1 | {}
50f445e75c46a32ca8c84a60 | 4 | 2 | {}
50f445e75c46a32ca8c84a61 | 4 | 3 | {}
(5 rows)
mosql will continue running, syncing any further changes
you make into Postgres.
For more documentation and usage information, see
the README.
mongoriver
MoSQL comes from a general philosophy of preferring real-time,
continuously-updating solutions to periodic batch jobs.
MoSQL is built on top
of mongoriver,
a general library for MongoDB oplog tailing that we
developed. Along with the MoSQL release, we have also released
mongoriver as open source today. If you find yourself wanting to
write your own MongoDB tailer, to monitor updates to your data in
near-realtime, check it out.
版权声明:本文发布于收集站云 内容均来源于互联网 如有侵权联系删除