From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | Marc Cousin <cousinmarc(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: query plan not optimal |
Date: | 2013-12-19 20:36:19 |
Message-ID: | 1387485379.74465.YahooMailNeo@web162902.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Marc Cousin <cousinmarc(at)gmail(dot)com> wrote:
> Then we insert missing paths. This is one of the plans that fail
> insert into path (path)
> select path from batch
> where not exists
> (select 1 from path where path.path=batch.path)
> group by path;
I know you said you wanted to focus on a different query, but this
one can easily be optimized. Right now it is checking for an
existing row in path for each row in batch; and it only needs to
check once for each path. One way to write it would be:
insert into path (path)
select path from (select distinct path from batch) b
where not exists
(select 1 from path p where p.path = b.path);
> So now we insert into the file table.
>
> insert into file (pathid,filename)
> select pathid, filename from batch join path using (path);
> What I think is the cause of the problem is that the planner
> doesn't take into account that we are going to fetch the exact
> same values all the time in the path table, so we'll have a very
> good hit ratio.
It kinda takes that into account for the index part of things via
the effective_cache_size setting. That should normally be set to
50% to 75% of machine RAM.
> Maybe the n_distinct from batch.path could be used to refine the
> caching effect on the index scan ?
Interesting idea.
> For now, we work around this by using very low values for
> seq_page_cost and random_page_cost for these 2 queries.
If you are not already doing so, you might want to try setting
cpu_tuple_cost to something in the 0.03 to 0.05 range. I have
found that the default is too low compared to other cpu cost
factors, and raising it makes the exact settings for page costs
less sensitive -- that is, you get good plans over a wider range of
page cost settings. I have sometimes been unable to get a good
plan for a query without boosting this, regardless of what I do
with other settings.
Running with a development build on my 16GB development PC, I got
your fast plan with your "big data" test case by making only this
one adjustment from the postgresql.conf defaults:
set effective_cache_size = '2GB';
I also got the fast plan if I left effective_cache_size at the
default and only changed:
set cpu_tuple_cost = 0.03;
I know that there have been adjustments to cost calculations for
use of large indexes in both minor and major releases. If a little
sensible tuning of cost factors to better match reality doesn't do
it for you, you might want to consider an upgrade.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Sergey Konoplev | 2013-12-19 21:24:26 | Re: slow query - will CLUSTER help? |
Previous Message | Sergey Konoplev | 2013-12-19 20:34:11 | Re: slow query - will CLUSTER help? |