Re: design partioning scheme for selecting from latest partition

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Niels Jespersen <NJN(at)dst(dot)dk>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: design partioning scheme for selecting from latest partition
Date: 2021-03-22 19:04:05
Message-ID: CA+bJJbx6CQ2O9oAcSBeGi-puV14SZzc0O6VYAu50ndO5895vEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Niels:

On Mon, Mar 22, 2021 at 3:40 PM Niels Jespersen <NJN(at)dst(dot)dk> wrote:
...
> version int not null,
...
> ) partition by list (version);
>
> -- Add tens of partitions
> -- Load millions of rows in each partition
> -- Then I want to be able to do this wothout scanning all partitions for the highest version number.
>
> select s.* from s where s.version = (select max(version) from s);

> I could add an index on the version column. But the only use would be to the newest partition, so that seems a bit like overkill, indexing 100 of milliomns of rows.

Without an index, or some caching, you would need to scan partitions.
Potentially several. Because you could have a partition for (0,3,6),
other for (1,2,8) and a another for 4. Are you sure list is better,
giving your message hints you make them increasing? I mean, even if
you do no tuse all versions, having ranges means you can query the
schema, order the partitions in desceding order in the range, query
each one and it must be either empty or contain the maximum. You could
do this if your lists are increasing too ( i.e, you have 1,2,3 and
10,12,14, and 100,200, but you haven't told that so we cannot assume
it ). With free lists, like the ones I've put above, you may scan
1,2,8 expecting an 8 to find max is a 2, then 0,3,6 expecting a 6 to
find max is a 3 and then 4 and find a max there, and without indexes
or insider knowledge every scan will have to be a full scan.

Even if you can do something like that, without an index you will need
a full scan, or do some trigger magic and keep a cache ( just keep
versio, count(*) on a table and maintain it ). If your partitions are
ordered, you can always keep the last one indexed, or if you know
versions do not decrease, you may keep things cached. This seems to be
the kind of problem where the generic solution is hard but a little
insider knowledge can accelerate it a lot.

Regards.
Francisco Olarte.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Saha, Sushanta K 2021-03-22 20:37:20 CSV From Oracle with timestamp column getting errors
Previous Message Edward Donahue III 2021-03-22 15:02:55 gdal32-libs-3-2-2-13.rhel bad dependency