Re: Faster distinct query?

From: Michael Lewis <mlewis(at)entrata(dot)com>
To: Israel Brewster <ijbrewster(at)alaska(dot)edu>
Cc: Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, PostgreSQL Mailing Lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Faster distinct query?
Date: 2021-09-24 04:55:29
Message-ID: CAHOFxGpv95Ld6gsJ26bY-wnw_5u709-ZbubDmdx7VoJLNikxGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> It’s only when I add in the AND data.channels=channels.channel that the
> query time blows up to 5+ minutes. I personally don’t understand why there
> would be such a large difference between the two queries - something wrong
> with my indexes?
>

Did you do the manual vacuum as suggested by Tom? Maybe at least check
pg_stat_user_tables for the last vacuum or autovacuum timestamp, and how
many records have changed since then. The system is deciding on an index
scan because that should be very fast, but if it needs to look at the
actual table table to determine if the process executing that query should
still be allowed to see that tuple (row version) then the index only scan
slows down a lot I believe. The huge number of heap fetches that Tom
pointed out mean that the system doesn't know that ALL processes should be
able to see those tuples. As someone else suggested lowering the max freeze
age, you might just do a manual "vacuum freeze" as well to allow future
auto vacuum processes to quickly skip over those pages as well.

I've heard of the loose indexscan before mentioned on this thread, but I'm
not seeing how to implement that for multiple columns. Anyone have an
example or pseudo code perhaps?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Clive Swan 2021-09-24 10:48:00 RE: Get COUNT results from two different columns
Previous Message Jaime Solorzano 2021-09-23 22:10:21 Re: Postgres incremental backups per db (not per cluster)