Re: Faster distinct query?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Israel Brewster <ijbrewster(at)alaska(dot)edu>, PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Faster distinct query?
Date: 2021-09-22 23:48:39
Message-ID: 2245342.1632354519@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Michael Lewis <mlewis(at)entrata(dot)com> writes:
> On Wed, Sep 22, 2021 at 2:48 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The "index-only" scan is reported to do 86m heap fetches along the
>> way to returning 812m rows, so the data is apparently pretty dirty.

> Do you say that because you would expect many more than 10 tuples per page?

No, I say that because if the table were entirely all-visible, there
would have been *zero* heap fetches. As it stands, it's reasonable
to suspect that a pretty sizable fraction of the index-only scan's
runtime went into random-access heap fetches made to verify
visibility of individual rows.

(You will, of course, never get to exactly zero heap fetches in an
IOS unless the table data is quite static. But one dirty page
out of every ten seems like there were a lot of recent changes.
A VACUUM to clean that up might be well worthwhile.)

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2021-09-23 00:58:31 Re: Faster distinct query?
Previous Message Michael Lewis 2021-09-22 22:51:51 Re: Faster distinct query?