no MCV list of tiny table with unique columns

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: no MCV list of tiny table with unique columns
Date: 2016-11-02 18:53:18
Message-ID: 20161102185318.GA3987@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We have a report query which joins (multiple times, actually) against this
trivial, tiny table:

ts=# \d bsm_to_switch
Table "public.bsm_to_switch"
Column | Type | Modifiers
--------+------+-----------
bsm | text | not null
switch | text | not null

ts=# SELECT length(bsm), length(switch) FROM bsm_to_switch;
length | length
--------+--------
10 | 6
10 | 6
(2 rows)

The column values are distinct.

I believe the join is being (badly) underestimated, leading to a crappy plan
involving multiple nested loop joins, which takes 2.5 hours instead of a
handful of seconds; I believe that might be resolved by populating its MCV
list..

..however, on reading commands/analyze.c, the issue is these columns have no
duplicates, and also postgres decides that "since the number of distinct rows
is greater than 10% of the total number of rows", that ndistinct should be -1
(meaning it scales with the table size). That's fine, except that it then
effectively precludes populating the MCV list.

| if (nmultiple == 0)
| {
| /*
| * If we found no repeated non-null values, assume it's a unique
| * column; but be sure to discount for any nulls we found.
| */
| stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
| }
| else if (track_cnt < track_max && toowide_cnt == 0 &&
| nmultiple == track_cnt)
| {
| /*
| * Our track list includes every value in the sample, and every
| * value appeared more than once. Assume the column has just
| * these values. (This case is meant to address columns with
| * small, fixed sets of possible values, such as boolean or enum
| * columns. If there are any values that appear just once in the
| * sample, including too-wide values, we should assume that that's
| * not what we're dealing with.)
| */
| stats->stadistinct = track_cnt;
| }

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE tablename='bsm_to_switch';
attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
bsm | f | 0 | 11 | -1 |
switch | f | 0 | 7 | -1 |
(2 rows)

Any ideas? I tried setting n_distinct=2, but that seems to not have any effect
within ANALYZE itself.

ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE tablename='bsm_to_switch';
attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
bsm | f | 0 | 11 | 2 |
switch | f | 0 | 7 | 2 |
(2 rows)

Thanks in advance.

Justin

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joao Junior 2016-11-02 19:06:05 archive_command too slow.
Previous Message Rick Otten 2016-11-02 14:55:52 Re: Perf decreased although server is better