From: | Konstantin Gredeskoul <kigster(at)gmail(dot)com> |
---|---|
To: | Matej <gmatej(at)gmail(dot)com>, Thomas Boussekey <thomas(dot)boussekey(at)gmail(dot)com> |
Cc: | Melvin Davidson <melvin6925(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: PG Sharding |
Date: | 2018-01-29 16:16:22 |
Message-ID: | MWHPR03MB3039CA0E934BB5262E5DD38EFCE50@MWHPR03MB3039.namprd03.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
When I worked at Wanelo, we built a sharded data store for a giant join table with 4B records and growing. We too could not find any generic sharding solution at the level of postgresql, and after some research decided to implement it in the application.
As it was written in ruby, here are some resources to point out:
https://github.com/taskrabbit/makara
https://github.com/wanelo/sequel-schema-sharding
The service used Sequel gem (not active record from Rails) and has been working very stable for us. I'm not sure if your project is in ruby or not, but wanted to give it a shout.
Another good resource is this ActivityFeed library, which relies on pluggable backends to support very high write to maintain precomputed activity feeds for each user using Redis. I'm a bit fan of moving things out of postgresql that don't have to be there :)
https://github.com/kigster/simple-feed
Best,
Konstantin
__
Konstantin Gredeskoul
https://kig.re/
https://reinvent.one/
(415) 265-1054
________________________________
From: Matej <gmatej(at)gmail(dot)com>
Sent: Monday, January 29, 2018 7:49:19 AM
To: Thomas Boussekey
Cc: Melvin Davidson; pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: PG Sharding
Hi Thomas.
Thanks.
Also looked at those solutions:
- PGXL Am a ltille afraid we would be the test dummies. Did not hear of many production installs.
- Citus seems heavily limited scalability vise, because of the master node design.
Regarding partitioning we are considering ourselves pg_pathman. Was hoping on PG10 partitioning but currently not really many changes performance vise.
Overall we are still considering manual APP/sharding as this seems to be the most scalable approach which least added latency. The builtin solutions seems to introduce extra lag and I am afraid of what to do when something goes wrong. then it's not a black box anymore and you have to study the details.
For node loss we plan a master -slave setup, and there will not be so many cross shard queries.
BR
Matej
2018-01-29 16:15 GMT+01:00 Thomas Boussekey <thomas(dot)boussekey(at)gmail(dot)com<mailto:thomas(dot)boussekey(at)gmail(dot)com>>:
Hello,
Facing the same situation, I'm considering 3 solutions:
- Sharding with postgres_xl (waiting for a Pg10 release)
- Sharding with citusdata (Release 7.2, compatible with Pg10 and pg_partman, seems interesting)
- Partitioning with PG 10 native partitioning or pg_partman
With colleagues, we have tested the 3 scenarios.
Sharding looks interesting, but you have to apprehend its behaviour in case of node loss, or cross-node queries.
Thomas
2018-01-29 15:44 GMT+01:00 Melvin Davidson <melvin6925(at)gmail(dot)com<mailto:melvin6925(at)gmail(dot)com>>:
On Mon, Jan 29, 2018 at 9:34 AM, Matej <gmatej(at)gmail(dot)com<mailto:gmatej(at)gmail(dot)com>> wrote:
Hi Everyone.
We are looking at a rather large fin-tech installation. But as scalability requirements are high we look at sharding of-course.
I have looked at many sources for Postgresql sharding, but we are a little confused as to shared with schema or databases or both.
So far our understanding:
SCHEMA.
PROS:
- seems native to PG
- backup seems easier
- connection pooling seems easier, as you can use same connection between shard.
CONS:
- schema changes seems litlle more complicated
- heard of backup and maintenance problems
- also some caching problems.
DATABASE:
PROS:
- schema changes litlle easier
- backup and administration seems more robust
CONS:
- heard of vacuum problems
- connection pooling is hard, as 100 shards would mean 100 pools
So what is actually the right approach? If anyone could shed some light on my issue.
Thanks
You might also want to consider GridSQL. IIRC it was originally developed by EnterpriseDB. I saw a demo of it a few years ago and it was quite impressive,
but I've had no interaction with it since, so you will have to judge for yourself.
https://sourceforge.net/projects/gridsql/?source=navbar
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you. [http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys2/01.gif]
From | Date | Subject | |
---|---|---|---|
Next Message | Kumar, Virendra | 2018-01-29 16:19:53 | pgpool Connections Distributions Among Nodes |
Previous Message | Bruce Momjian | 2018-01-29 16:15:49 | Re: [GENERAL] Matching statement and duration log lines |