Re: Slow recursive CTE query questions, with row estimate and n_distinct issues

From: Christopher Baines <mail(at)cbaines(dot)net>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Slow recursive CTE query questions, with row estimate and n_distinct issues
Date: 2020-12-29 22:40:39
Message-ID: 875z4k9qxk.fsf@cbaines.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Michael Lewis <mlewis(at)entrata(dot)com> writes:

> On Mon, Dec 28, 2020 at 7:51 AM Christopher Baines <mail(at)cbaines(dot)net> wrote:
>
>> derivation_inputs:
>> COUNT(*): 285422539
>> reltuples: 285422528
>>
>> derivation_id:
>> COUNT(DISTINCT): 7508610
>> n_distinct: 4336644 (~57% of the true value)
>>
>> derivation_output_id:
>> COUNT(DISTINCT): 5539406
>> n_distinct: 473762 (~8% of the true value)
>>
>
> If you expect the ratio of distinct of derivation_output_id values to be
> roughly linear going forward, you can set a custom value for n_distinct on
> the column (currently seems like -.0194, aka distinct count
> of derivation_output_id divided by reltuples of the table). You could also
> do this analysis every month or six and set the custom value as needed.
>
> https://www.postgresql.org/docs/current/sql-altertable.html
>
> I am not sure if it will resolve your query problems though.

Thanks Michael, I didn't realise a custom value could be set, but I'll
look in to this.

I actually managed to speed the query up enough by increasing
work_mem/shared_buffers. I didn't realise one of the sequential scans
was executing 14 times, but giving PostgreSQL more resources means it
just executes once, which really helps.

Thanks again,

Chris

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Baines 2020-12-29 22:45:55 Re: Slow recursive CTE query questions, with row estimate and n_distinct issues
Previous Message Greg Spiegelberg 2020-12-28 16:09:09 Re: Slow recursive CTE query questions, with row estimate and n_distinct issues