From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | James Finnerty <jfinnert(at)amazon(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: postgres_fdw: using TABLESAMPLE to collect remote sample |
Date: | 2023-01-06 14:40:36 |
Message-ID: | 4ee816f5-6146-7745-a6b4-f38a1b24105a@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've pushed the comment/assert cleanup.
Here's a cleaned up version of the relkind check. This is almost
identical to the patch from yesterday (plus the renames and updates of
comments for modified functions).
The one difference is that I realized the relkind check does not
actually say we can't do sampling - it just means we can't use
TABLESAMPLE to do it. We could still use "random()" ...
Furthermore, I don't think we should silently disable sampling when the
user explicitly requests TABLESAMPLE by specifying bernoulli/system for
the table - IMHO it's less surprising to just fail in that case.
So we now do this:
if (!can_tablesample && (method == ANALYZE_SAMPLE_AUTO))
method = ANALYZE_SAMPLE_RANDOM;
Yes, we may still disable sampling when reltuples is -1, 0 or something
like that. But that's a condition that is expected for new relations and
likely to fix itself, which is not the case for relkind.
Of course, all relkinds that don't support TABLESAMPLE currently have
reltuples value that will disable sampling anyway (e.g. views have -1).
So we won't actually fallback to random() anyway, because we can't
calculate the sample fraction.
That's a bit annoying for foreign tables pointing at a view, which is a
more likely use case than table pointing at a sequence. And likely more
of an issue, because views may return a many rows (while sequences have
only a single row).
But I realized we could actually still do "random()" sampling:
SELECT * FROM t ORDER BY random() LIMIT $X;
where $X is the target number of rows for sample for the table. Which
would result in plans like this (given sufficient work_mem values)
QUERY PLAN
-------------------------------------------------------------------
Limit (actual rows=30000 loops=1)
-> Sort (actual rows=30000 loops=1)
Sort Key: (random())
Sort Method: top-N heapsort Memory: 3916kB
-> Append (actual rows=1000000 loops=1)
Even with lower work_mem values this would likely be a win, due to
saving on network transfers.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
0001-Check-relkind-before-using-TABLESAMPLE-in-postgres_f.patch | text/x-patch | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2023-01-06 14:49:34 | Re: Optimizing Node Files Support |
Previous Message | Andrew Dunstan | 2023-01-06 14:40:31 | Re: Cygwin cleanup |