Quick Links

Re: Faster distinct query?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Michael Lewis <mlewis(at)entrata(dot)com>
Cc:	"David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Israel Brewster <ijbrewster(at)alaska(dot)edu>, PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Faster distinct query?
Date:	2021-09-22 23:48:39
Message-ID:	2245342.1632354519@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Michael Lewis <mlewis(at)entrata(dot)com> writes:
> On Wed, Sep 22, 2021 at 2:48 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The "index-only" scan is reported to do 86m heap fetches along the
>> way to returning 812m rows, so the data is apparently pretty dirty.

> Do you say that because you would expect many more than 10 tuples per page?

No, I say that because if the table were entirely all-visible, there
would have been *zero* heap fetches. As it stands, it's reasonable
to suspect that a pretty sizable fraction of the index-only scan's
runtime went into random-access heap fetches made to verify
visibility of individual rows.

(You will, of course, never get to exactly zero heap fetches in an
IOS unless the table data is quite static. But one dirty page
out of every ten seems like there were a lot of recent changes.
A VACUUM to clean that up might be well worthwhile.)

regards, tom lane

In response to

Re: Faster distinct query? at 2021-09-22 22:51:51 from Michael Lewis

Browse pgsql-general by date

	From	Date	Subject
Next Message	David Rowley	2021-09-23 00:58:31	Re: Faster distinct query?
Previous Message	Michael Lewis	2021-09-22 22:51:51	Re: Faster distinct query?