Re: single table - fighting a seq scan

From: Michael Lewis <mlewis(at)entrata(dot)com>
To: Radoslav Nedyalkov <rnedyalkov(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: single table - fighting a seq scan
Date: 2020-07-14 18:45:38
Message-ID: CAHOFxGqfRa7uPkG2dEv5U2rt_ZLORVvBYGXLK7t7ZG8HGQgeeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

rows=3832
rows=3870

Your estimate changed very little when you included 100 values vs 200
values. That is interesting to me.

What does the below query give you? How many of those 200 values are found
in the MCVs list? If n_distinct is low, and most of the values are NOT in
the most common value list, and the fraction of the table covered by the
MCVs is also low, then the planner will expect that each of the 200 values
represents some deceivingly high portion of the table.

You said there are 80 million rows, yes? That seems likely that ndistinct
and the MCVs list are not giving info very correlated with reality. You may
want to increase the minimum table size for sequential to kick in. I cannot
recall the name of that setting at the moment. You may also want to
increase stats target on that column at least, analyze, and explain the
query again.

SELECT
( SELECT SUM (x) FROM UNNEST (most_common_freqs) x ) frac_MCV,
tablename,
attname,
inherited,
null_frac,
n_distinct,
array_length(most_common_vals,1) n_mcv,
array_length(histogram_bounds,1) n_hist,
correlation,
*
FROM pg_stats
WHERE
schemaname = 'public'
AND tablename='te'
AND attname='current_pid';

>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Radoslav Nedyalkov 2020-07-14 19:31:39 Re: single table - fighting a seq scan
Previous Message Marc Millas 2020-07-14 18:43:51 Re: how to "explain" some ddl