Re: Force a merge join?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
Cc: "Ian Harding" <ianh(at)tpchd(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Force a merge join?
Date: 2002-05-18 23:34:47
Message-ID: 24882.1021764887@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Doug Fields <dfields-pg-general(at)pexicom(dot)com> writes:
> I find that this is actually fairly typical, where the row estimates and
> actual rows are off by orders of magnitudes.

One would like to think that 7.2 is better than previous releases,
especially for relatively simple queries such as this.

> pexitest=# select count(distinct list_id) from test_list_entries;
> count
> -------
> 308
> (1 row)

As opposed to the pg_stats estimate of 189 ... not too bad, really.
Is the most-common-values distribution shown in the pg_stats output
reasonably correct? (Specifically, 38 has nearly 40% of the entries,
192 another 12%, and everything else 1.5% or less)

Another question is exactly what version you are running. I tried
plugging the stats values you gave into pg_statistic by hand, and
got this plan from current sources:

Merge Join (cost=29720.79..29843.05 rows=423 width=140)
Merge Cond: ("outer".lower_email = "inner".lower_email)
-> Sort (cost=11371.23..11393.77 rows=9016 width=72)
Sort Key: a.lower_email
-> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries a (cost=0.00..10565.78 rows=9016 width=72)
Index Cond: ((list_id = 148) OR (list_id = 146) OR (list_id = 145) OR (list_id = 147) OR (list_id = 144))
-> Sort (cost=18349.56..18385.50 rows=14377 width=68)
Sort Key: b.lower_email
-> Index Scan using test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries b (cost=0.00..17013.92 rows=14377 width=68)
Index Cond: ((list_id = 247) OR (list_id = 433) OR (list_id = 249) OR (list_id = 434) OR (list_id = 238) OR (list_id = 340) OR (list_id = 339) OR (list_id = 418))
(10 rows)

ie, it's estimating about 1800 matches per list_id value, which seems
pretty reasonable given that it knows none of these values are in the
most_common list. Now I don't see anything in the CVS logs to suggest
that the estimation of this query would've changed since late in 7.2
beta cycle, so I'm confused why you don't get similar results.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Doug Fields 2002-05-18 23:54:44 Re: Force a merge join?
Previous Message Tom Lane 2002-05-18 23:24:07 Re: Force a merge join?