Tom Lane wrote:
Why are the rowcount estimates so far off?  Maybe you need to increase
the statistics target for this table.
			regards, tom lane
  
Tom,
How does one tell the rowcount is off in a query plan?  I've never found a great reference on interpreting the query analyze output!

Upping the stats target made little difference in the estimated row count, but a huge difference in the query.  It also mattered
only on the context_key column, not the other columns.  Why would that be?


lyell5=> \pset format wrapped
lyell5=> select * from pg_stats where tablename='article_words';
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+
| schemaname |  tablename  | attname  | null_frac | avg_width | n_distinct |  most_common_vals  | most_common_freqs |  histogram_bounds   | correlation |
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+
| public     | article_wor | word_key |         0 |         4 |       6361 | {382,226,1155,2130 | {0.003,0.00233333 | {3,623,1376,2074,31 |   0.0292482 |
|            ; ds                                                          ; ,2246,3328,8999,73 ; ,0.00233333,0.002 ; 54,4601,7269,12770,               |
|                                                                          ; 3,1760,1796}       ; 33333,0.00233333, ; 30017,212496,582068               |
|                                                                                               ; 0.00233333,0.0023 ; 0}                                |
|                                                                                               ; 3333,0.002,0.002,                                     |
|                                                                                               ; 0.002}                                                |
| public     | article_wor | context_ |         0 |         4 |       4671 | {639302,113013,133 | {0.00233333,0.001 | {28,42838,92697,140 |    0.867505 |
|            ; ds          ; key                                           ; 052,211978,508496, ; 66667,0.00166667, ; 684,202950,248442,3               |
|                                                                          ; 545123,590829,5985 ; 0.00166667,0.0016 ; 38954,403025,498952               |
|                                                                          ; 95,649645,37658}   ; 6667,0.00166667,0 ; ,584048,654070}                   |
|                                                                                               ; .00166667,0.00166                                     |
|                                                                                               ; 667,0.00166667,0.                                     |
|                                                                                               ; 00133333}                                             |
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+
lyell5=> analyze verbose article_words;
INFO:  "article_words": scanned 3000 of 1125489 pages, containing 555000 live rows
and 0 dead rows; 3000 rows in sample, 208,215,465 estimated total rows

lyell5=> set default_statistics_target to 500;
lyell5=> analyze verbose article_words;
INFO:  "article_words": scanned 150000 of 1125489 pages, containing 27749820 live rows
and 180 dead rows; 150000 rows in sample, 208,214,114 estimated total rows



But it makes a huge impact on this query:

stats target
sampled
query duration after analyze
10
3000
80600ms
15
4500
64000ms
20
6000
4.2ms
30
9000
4.2ms
250
75000
4.2ms
500
?
4.2ms
1000
300000
4.2ms

Is there any good reason not to set stats target to 1000 always?




lyell5=> alter table article_words alter word_key set statistics 30;analyze verbose article_words;
explain analyze select words.* from article_words join words using (word_key) where context_key=535462;
INFO:  "article_words": scanned 9000 of 1125489 pages, containing 1665000 live rows and 0 dead rows; 9000 rows in sample, 208215465 estimated total rows
+------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                   QUERY PLAN                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..160395.31 rows=16170 width=13) (actual time=0.034..4.025 rows=777 loops=1)                                            |
|   ->  Index Scan using article_word_idx on article_words  (cost=0.00..18258.96 rows=16170 width=4) (actual time=0.023..0.344 rows=777 loops=1) |
|         Index Cond: (context_key = 535462)                                                                                                     |
|   ->  Index Scan using words_pkey on words  (cost=0.00..8.78 rows=1 width=13) (actual time=0.003..0.004 rows=1 loops=777)                      |
|         Index Cond: (words.word_key = article_words.word_key)                                                                                  |
| Total runtime: 4.231 ms                                                                                                                        |
+------------------------------------------------------------------------------------------------------------------------------------------------+


lyell5=> alter table article_words alter word_key set statistics 10;analyze verbose article_words;
explain analyze select words.* from article_words join words using (word_key) where context_key=535462;
INFO:  "article_words": scanned 3000 of 1125489 pages, containing 555000 live rows and 0 dead rows; 3000 rows in sample, 208215465 estimated total rows
+------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                   QUERY PLAN                                                                   |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Hash Join  (cost=192089.54..265653.38 rows=44589 width=13) (actual time=18809.184..80685.239 rows=777 loops=1)                                 |
|   Hash Cond: (article_words.word_key = words.word_key)                                                                                         |
|   ->  Index Scan using article_word_idx on article_words  (cost=0.00..44717.05 rows=44589 width=4) (actual time=0.025..0.597 rows=777 loops=1) |
|         Index Cond: (context_key = 535462)                                                                                                     |
|   ->  Hash  (cost=93818.13..93818.13 rows=5653313 width=13) (actual time=18808.872..18808.872 rows=5651551 loops=1)                            |
|         ->  Seq Scan on words  (cost=0.00..93818.13 rows=5653313 width=13) (actual time=0.007..7845.824 rows=5651551 loops=1)                  |
| Total runtime: 80686.217 ms                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------+