From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [RFC] speed up count(*) |
Date: | 2021-10-20 18:22:10 |
Message-ID: | A4A9A8FF-768B-4A4D-B8EA-E41483401313@anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On October 20, 2021 10:57:50 AM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>John Naylor <john(dot)naylor(at)enterprisedb(dot)com> writes:
>> Perennially our users have complaints about slow count(*) when coming from
>> some other systems. Index-only scans help, but I think we can do better. I
>> recently wondered if a BRIN index could be used to answer min/max aggregate
>> queries over the whole table, and it turns out it doesn't. However, then it
>> occurred to me that if we had an opclass that keeps track of the count in
>> each page range, that would be a way to do a fast count(*) by creating the
>> right index. That would require planner support and other work, but it
>> seems doable. Any opinions on whether this is worth the effort?
>
>The core reason why this is hard is that we insist on giving the right
>answer. In particular, count(*) is supposed to count the rows that
>satisfy the asker's snapshot. So I don't see a good way to answer it
>from an index only, given that we don't track visibility accurately
>in indexes.
Yeah.
If we really wanted to, we could accelerate unqualified count(*) substantially by computing the count inside heapam. There's a *lot* of overhead associated with returning tuples, grouping them, etc. Especially with all_visible set that's bound to be way faster (I'd guess are least 3-5x) if done in heapam (like we do the visibility determinations in heapgetpage for all tuples on a page at once).
But it's doubtful the necessary infrastructure is worth it. Perhaps that changes with the infrastructure some columnar AMs are asking for. They have a need to push more stuff down to the AM that's more generic than just count(*).
Regards,
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-10-20 18:23:20 | Re: [RFC] speed up count(*) |
Previous Message | Tom Lane | 2021-10-20 17:57:50 | Re: [RFC] speed up count(*) |