Re: partitions vs indexes

From: "Enrico Thierbach" <eno(at)open-lab(dot)org>
To: "Michael Lewis" <mlewis(at)entrata(dot)com>
Cc: "PostgreSQL General" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: partitions vs indexes
Date: 2019-10-03 10:29:34
Message-ID: 7E29E5D6-56A8-4043-A35A-368430DD9190@open-lab.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2 Oct 2019, at 22:16, Michael Lewis wrote:

> "I would like to convert a table with a primary key into a partitioned
> setup by a column which is not part of the primary key"
>
> That isn't possible. The partition key must be contained by the
> primary
> key. That is, the primary key could be site_id, id and you can create
> hash
> partition on id or site_id but not created_on.
>
> You could drop primary key and foreign keys and implement them via
> trigger
> functions as described in this blog series, but it seems questionable-
> https://www.depesz.com/2018/10/31/foreign-key-to-partitioned-table-part-2/
>
> I do not assume the restriction would be dropped in future releases. I
> don't know that scanning all the partitions to figure out whether the
> primary key is violated would be advisable. Which is what the trigger
> functions described in the blog post has to do, right?
>
> It might be noteworthy that partitioning with more than 10-100
> partitions
> is MUCH faster in PG12 than PG11 (up to 4-8 thousand partitions) from
> testing shared by those working on that code.
>
>>

Michael, thank you for your response. I think I now grasp the idea: if
there is a uniqueness constraint, then the database would rather not
visit all partitions to check for constraint violation, but want to
identify the single partition that might fail that; hence any
partitioning value must be a subset of or be identical to the uniqueness
constraint.

I get that this makes sense if you have *many* partitions; however, I
basically want to end up with two partitions, “hot” and “cold”.
A row’s lifetime starts in a hot partition, and, after being
processed, moves into the cold partition.

Most of the work actually happens in the hot partition, so I think
having this as small as possible is probably helpful. For numbers: the
hot partition would tyically contain ~10000 rows, the cold partition, on
the other hand, will have 10s of millions. At the same time I still want
to be able to look up a row by its id in the root relation, not in the
concrete partitions. Having the database validate a uniqueness
constraint in two tables instead of in one would be a worthwhile
sacrifice for me.

Having said that I just realized I could probably reach my goal by
setting up explicit hot and cold tables, move rows around manually
whenever their “hotness” changes, and set up a view which combines
both tables into a single relation. I would only have to drop all
explicit FK references from the schema. A downside, certainly, but one
that I could live with.

Best,
/eno

--
me on github: http://github.com/radiospiel

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2019-10-03 10:40:20 Re: Advice for geographically dispersed multi master
Previous Message Nikolai Lusan 2019-10-03 09:05:04 Advice for geographically dispersed multi master