Re: Faster distinct query?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Israel Brewster <ijbrewster(at)alaska(dot)edu>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Michael Lewis <mlewis(at)entrata(dot)com>, PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Faster distinct query?
Date: 2021-09-23 01:41:52
Message-ID: CAApHDvot+Yr_AWd0ggQk=iM+oUtV3Epm6vmWq_kf0fcBifvLeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 23 Sept 2021 at 13:21, Israel Brewster <ijbrewster(at)alaska(dot)edu> wrote:
> Ah, yes indeed. That version runs in about 30 seconds rather than 5 minutes! See the explain analyze output here: https://explain.depesz.com/s/L5Bf It looks more complicated, but being able to run parallel definitely makes a difference, and there may be some other improvements in there that I’m not aware of as well!

That's good. You should also look into the VACUUM thing mentioned by
Tom. If this table is just receiving INSERTs and not UPDATE/DELETEs
then you might want to consider tweaking the auto-vacuum settings for
it.

The default autovacuum_vacuum_insert_scale_factor will mean that
auto-vacuum will only kick off a worker to vacuum this table when 20%
of the total rows have been inserted since the last vacuum. It's
possible that might account for your large number of heap fetches.

If the table is insert-only, then you could drop the
autovacuum_vacuum_insert_scale_factor down a bit. In the command
below, I set it to 2%. Also dropping the autovacuum_freeze_min_age is
a pretty good thing to do for tables that are never or almost never
are UPDATEd or DELETEd from.

alter table data set (autovacuum_vacuum_insert_scale_factor=0.02,
autovacuum_freeze_min_age=0);

Vacuuming an insert-only table more often is not a great deal of extra
work, and it's possible even less work if you were to vacuum before
recently inserted pages got evicted from shared_buffers or the
kernel's buffers. The already vacuumed and frozen portion of the
table will be skipped using the visibility and freeze map, which is
very fast to do.

David

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ryan Booz 2021-09-23 01:49:48 Re: Faster distinct query?
Previous Message David Rowley 2021-09-23 01:31:36 Re: Faster distinct query?