From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Merging statistics from children instead of re-sampling everything |
Date: | 2022-02-10 22:37:26 |
Message-ID: | 4e86ae74-4e2c-b40f-4405-035d2f818e5d@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/10/22 12:50, Andrey Lepikhov wrote:
> On 21/1/2022 01:25, Tomas Vondra wrote:
>> But I don't have a very good idea what to do about statistics that we
>> can't really merge. For some types of statistics it's rather tricky to
>> reasonably merge the results - ndistinct is a simple example, although
>> we could work around that by building and merging hyperloglog counters.
>
> I think, as a first step on this way we can reduce a number of pulled
> tuples. We don't really needed to pull all tuples from a remote server.
> To construct a reservoir, we can pull only a tuple sample. Reservoir
> method needs only a few arguments to return a sample like you read
> tuples locally. Also, to get such parts of samples asynchronously, we
> can get size of each partition on a preliminary step of analysis.
> In my opinion, even this solution can reduce heaviness of a problem
> drastically.
>
Oh, wow! I haven't realized we're fetching all the rows from foreign
(postgres_fdw) partitions. For local partitions we already do that,
because that uses the usual acquire function, with a reservoir
proportional to partition size. I have assumed we use tablesample to
fetch just a small fraction of rows from FDW partitions, and I agree
doing that would be a pretty huge benefit.
I actually tried hacking that together - there's a couple problems with
that (e.g. determining what fraction to sample using bernoulli/system),
but in principle it seems quite doable. Some minor changes to the FDW
API may be necessary, not sure.
Not sure about the async execution - that seems way more complicated,
and the sampling reduces the total cost, async just parallelizes it.
That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2022-02-10 22:45:17 | Re: Add jsonlog log_destination for JSON server logs |
Previous Message | Andres Freund | 2022-02-10 22:26:59 | Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: |