From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | no MCV list of tiny table with unique columns |
Date: | 2016-11-02 18:53:18 |
Message-ID: | 20161102185318.GA3987@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
We have a report query which joins (multiple times, actually) against this
trivial, tiny table:
ts=# \d bsm_to_switch
Table "public.bsm_to_switch"
Column | Type | Modifiers
--------+------+-----------
bsm | text | not null
switch | text | not null
ts=# SELECT length(bsm), length(switch) FROM bsm_to_switch;
length | length
--------+--------
10 | 6
10 | 6
(2 rows)
The column values are distinct.
I believe the join is being (badly) underestimated, leading to a crappy plan
involving multiple nested loop joins, which takes 2.5 hours instead of a
handful of seconds; I believe that might be resolved by populating its MCV
list..
..however, on reading commands/analyze.c, the issue is these columns have no
duplicates, and also postgres decides that "since the number of distinct rows
is greater than 10% of the total number of rows", that ndistinct should be -1
(meaning it scales with the table size). That's fine, except that it then
effectively precludes populating the MCV list.
| if (nmultiple == 0)
| {
| /*
| * If we found no repeated non-null values, assume it's a unique
| * column; but be sure to discount for any nulls we found.
| */
| stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
| }
| else if (track_cnt < track_max && toowide_cnt == 0 &&
| nmultiple == track_cnt)
| {
| /*
| * Our track list includes every value in the sample, and every
| * value appeared more than once. Assume the column has just
| * these values. (This case is meant to address columns with
| * small, fixed sets of possible values, such as boolean or enum
| * columns. If there are any values that appear just once in the
| * sample, including too-wide values, we should assume that that's
| * not what we're dealing with.)
| */
| stats->stadistinct = track_cnt;
| }
ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE tablename='bsm_to_switch';
attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
bsm | f | 0 | 11 | -1 |
switch | f | 0 | 7 | -1 |
(2 rows)
Any ideas? I tried setting n_distinct=2, but that seems to not have any effect
within ANALYZE itself.
ts=# SELECT attname, inherited, null_frac, avg_width, n_distinct, most_common_vals FROM pg_stats WHERE tablename='bsm_to_switch';
attname | inherited | null_frac | avg_width | n_distinct | most_common_vals
---------+-----------+-----------+-----------+------------+------------------
bsm | f | 0 | 11 | 2 |
switch | f | 0 | 7 | 2 |
(2 rows)
Thanks in advance.
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Joao Junior | 2016-11-02 19:06:05 | archive_command too slow. |
Previous Message | Rick Otten | 2016-11-02 14:55:52 | Re: Perf decreased although server is better |