Re: All Taxi Services need Index Clustered Heap Append

From: Alex Kane <alex(at)alexkane(dot)net>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: All Taxi Services need Index Clustered Heap Append
Date: 2018-03-05 17:01:29
Message-ID: CAP-aJ2CAYufcd03zK0D79nMesZ7gXN702NLdZWrLdzH+NsW+FA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

https://aws.amazon.com/dms/

DMS might be helpful if you need to move off of RDS

Alex Kane

On Mon, Mar 5, 2018 at 11:48 AM, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> wrote:

> On Mon, Mar 5, 2018 at 2:11 PM, Darafei "Komяpa" Praliaskouski
> <me(at)komzpa(dot)net> wrote:
> >> This approach mixes well with hash
> >> partitioning. It would be neat indeed if PostgreSQL do something
> >> equivalent on its own, and pluggable storage work being done could
> >> enable index organized tables that would help. But you probably need
> >> something right now.
> >
> >
> > Fixing glaring issues (no vacuum and thus no Index-Only Scan on
> append-only
> > tables, vacuum processing all of the eternity of btree) by 11 will get
> most
> > of spike-nails out of the microservice code, and we can probably live
> with
> > them until 11 gets to RDS.
> >
> > I also don't see why a pluggable storage is a must for the clustered
> write.
> > Postgres does have a mechanism for selecting the next page to write tuple
> > to, right now it's just looking at FSM - but what if it just peeked at
> > existing index that already has enough the data to route tuple to correct
> > page on write?
>
> The mechanism you outlined would likely work for your use case, but it
> has many issues that prevent it from being universally useful. From
> the top of my head:
>
> * One extra index descent per insertion (I/O for this is necessary
> anyway, but CPU work is duplicated).
> * We don't currently track the amount of bloat. A mechanism that does
> this needs to be added.
> * If table hits the bloat limit there will be a sudden change in
> behavior. This is pretty nasty from an operations point of view.
> * With your (id,ts) clustering and data coming in mostly ordered by
> timestamp, after initial warmup, each page will contain rows from a
> single id, but different ids are arbitrarily interleaved. This is
> better than current state, but people might want to have an
> interleaving step bigger than 8kB to better utilize storage hardware.
> * It seems that with a common (ts) clustering and age of timestamp
> coming from an exponential distribution, this will quickly bloat to
> threshold and then insert data in a rather arbitrary order. This is
> much worse than the default behavior.
>
> At least in my opinion these problems make it a special case
> optimization that is hard to justify in core. A decent alternative
> would be a plugin mechanism for locating free space for a tuple where
> you can write your extension to find a suitable location for the row.
>
> >> I guess I don't have to tell you that it looks like your needs have
> >> outgrown what RDS works well with and you are in for a painful move
> >> sooner or later.
> >
> >
> > Painful move where to? If we just run a Postgres instance without RDS
> we'll
> > get the pain of setting up Postgres and replication and backups and
> > autofailover, with no visible gain except if we get some private /
> > unaccepted patches applied to it. If we can get these things right
> upstream
> > why would we want to switch?
>
> EC2 for example. Mainly because I3 instances and ephemeral provide an
> order of magnitude or two of performance improvement while costing
> less. Being able to run custom extensions and patches if necessary is
> a nice bonus. Yes, setting up replication, autofailover and backups is
> extra work that you have to weigh against the benefits. But don't
> overestimate the effort - there are some pretty nice tools available
> that make a proper cluster relatively simple to set up.
>
> > Per my colleagues, MySQL offers clustered index, also MySQL is available
> on
> > RDS without the need of "painful move", which is doable by writing to two
> > locations for a day and then pointing readers to new DB. But if we can
> > instead do no move and be sure the issues are gone upstream before we hit
> > the limit of spike-nails we're running on currently, wouldn't that be
> > better? :)
>
> The move off of RDS is painful because getting data out of RDS
> involves either downtime or building an ad-hoc logical replication
> solution. You need to solve that regardless of where you move to.
>
> Providing an out-of-the-box solution in core PostgreSQL would of
> course be best, but realistically you will be waiting at least 2 years
> to get it on RDS. In the meanwhile either the buffer partition
> approach I described, or a buffering microservice in front of
> PostgreSQL like Aleksander recommended should fix data locality for
> you. If you weren't running on RDS I would even propose using Redis as
> the buffer with one key per driver and redis_fdw to make the data
> accessible from within PostgreSQL.
>
> Regards,
> Ants Aasma
> --
> +43-670-6056265
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2018-03-05 17:14:56 Re: [bug fix] pg_rewind takes long time because it mistakenly copies data files
Previous Message Arthur Zakirov 2018-03-05 16:57:09 Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw