From: | Julien Cigar <jcigar(at)ulb(dot)ac(dot)be> |
---|---|
To: | Ants Aasma <ants(at)cybertec(dot)at> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: bad plan |
Date: | 2012-04-06 11:16:39 |
Message-ID: | 4F7ED097.70807@ulb.ac.be |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 04/05/2012 21:47, Ants Aasma wrote:
> On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar<jcigar(at)ulb(dot)ac(dot)be> wrote:
>> - http://www.pastie.org/3731956 : with default config
>> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off
> It looks like the join selectivity of (context_to_context_links,
> ancestors) is being overestimated by almost two orders of magnitude.
> The optimizer thinks that there are 564 rows in the
> context_to_context_links table for each taxon_id, while in fact for
> this query the number is 9. To confirm that this, you can force the
> selectivity estimate to be 200x lower by adding a geo_id = geod_id
> where clause to the subquery.
adding a geo_id = geo_id to the subquery helped a little bit with a
cpu_tuple_cost of 0.1: http://www.pastie.org/3738224 :
without:
Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..146.93 rows=341 width=8) (actual time=0.004..0.019 rows=9 loops=736)
with geo_id = geo_id:
Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..148.11 rows=2 width=8) (actual time=0.004..0.020 rows=9 loops=736)
> If it does help, then the next question would be why is the estimate
> so much off. It could be either because the stats for
> context_to_context_links.taxon_id are wrong or because
> ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
> figuring this is out, you could run the following to queries and post
> the results:
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
> dist GROUP BY 1 ORDER BY 1;
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
> and taxon_id= ANY ( select taxon_id from rab.ancestors where
> ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
> 1 ORDER BY 1;
I'm sorry but I get an "ERROR: division by zero" for both of your queries..
> If the second distribution has a significantly different shape then
> cross column statistics are necessary to get good plans. As it happens
> I'm working on adding this functionality to PostgreSQL and would love
> to hear more details about your use-case to understand if it would be
> solved by this work.
Thank you for your help,
Julien
> Regards,
> Ants Aasma
--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
Attachment | Content-Type | Size |
---|---|---|
jcigar.vcf | text/x-vcard | 292 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2012-04-06 17:11:37 | Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster |
Previous Message | Tomas Vondra | 2012-04-05 20:06:49 | Re: H800 + md1200 Performance problem |