Re: Postgresql 13 partitioning advice

From: Ameya Bidwalkar <bidwalkar(dot)ameya10(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgresql 13 partitioning advice
Date: 2022-08-03 11:05:25
Message-ID: CA+OaHWJUp83WOrEpvwwbQ07eu7n1yWoFa4s=yF=pwT_oqO-Kzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello David,

Thank you for the valuable inputs.We will test these scenarios .

Regards,
Ameya

On Tue, Aug 2, 2022 at 12:16 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Tue, 2 Aug 2022 at 19:48, Ameya Bidwalkar
> <bidwalkar(dot)ameya10(at)gmail(dot)com> wrote:
> > We have a Postgresql 13 database where we have a single table with
> several millions of rows . We plan to partition it based on timestamp .
> > We have been seeking advice for best practices for building this.
> > This table will get lots of updates for the same rows during a short
> period of time.During this time rows would be in a single partition .
> > After this short time these rows would move to another partition
> .Where no more updates take place on these rows.But might have some SELECT
> queries running.
> > We plan to l have partitions based on months and then roll them up in a
> year and then archive these older partitions
> > One consultant we talked with told us this row movement between the
> partitions will have
> > huge complications .But this was an issue during the Postgres 10
> version .
>
> Define "huge complications"?
>
> The capabilities of partitioned tables have changed quite a bit since
> the feature was added. It's very easy for knowledge to get out-dated
> in this area. I did quite a bit of work on them and I struggle to
> remember off the top of my head which versions saw which improvements.
> PG12 saw lots. See [1], search for "partition".
>
> One possible complication is what is mentioned in [2] about
> "serialization failure error". UPDATEs that cause a tuple to move to
> another partition can cause a serialization failure at transaction
> isolation level, not just serializable transactions. If it's not
> already, you might want to have your application retry transactions on
> SQL:40001 errors.
>
> Apart from that, assuming there's comparatively a small number of rows
> in the partition being updated compared to the partition with the
> static rows, then it sounds fairly efficient. As you describe it, the
> larger static partition is effectively INSERT only and auto-vacuum
> will need to touch it only for tuple freezing work. The smaller of
> the two tables will receive more churn but will be faster to vacuum.
> PG13 got a new feature that makes sure auto-vacuum also does the
> rounds on INSERT-only tables too, so the static partition is not going
> to be neglected until anti-wrap-around-autovacuums trigger, like they
> would have in PG12 and earlier.
>
> Another thing to consider is that an UPDATE of a non-partitioned table
> has a chance at being a HOT update. That's possible if the tuple can
> fit on the same page and does not update any of the indexed columns. A
> HOT update means no indexes need to be updated so these perform faster
> and require less space in WAL than a non-HOT update. An UPDATE that
> moves a tuple to another partition can never be a HOT update. That's
> something you might want to consider. If you're updating indexed
> columns already then it's not a factor to consider. There's also
> overhead to postgres having to find the partition for the newly
> updated version of the tuple. That's not hugely expensive, but it's
> generally measurable. RANGE partitioned tables with a large number of
> partitions will have the most overhead for this. HASH partitioned
> tables, the least.
>
> The best thing you can likely do is set up a scenario with pgbench and
> compare the performance. pgbench is a pretty flexible tool that will
> allow you to run certain queries X% of the time and even throttle the
> workload at what you expect your production server to experience. You
> could then run it overnight on a test server, or even for weeks and
> see how auto-vacuum keeps up when compared to the non-partitioned
> case. You can also check how much extra WAL is generated vs the
> non-partitioned case.
>
> > So we are seeking advice on the performance perspective and things we
> should take care of along with manual vacuums on a regular schedule and
> indexing.
> > Are there any tunables I should experiment with in particular ?
>
> Perhaps if you want to keep a small high-chun table in check you might
> want to consider if autovacuum_naptime is set low enough. You may not
> care if the space being consumed in the standard 1min
> autovacuum_naptime is small enough not to be of concern.
>
> David
>
> [1] https://www.postgresql.org/docs/release/12.0/
> [2] https://www.postgresql.org/docs/13/sql-update.html
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message bruno da silva 2022-08-03 15:12:49 Re: PostgresSQL 9.5.21 very slow to connect and perform basic queries
Previous Message Rory Campbell-Lange 2022-08-02 19:18:16 Re: PgSQL 14 - Logical Rep - Single table multiple publications?