From: | Marc Cousin <cousinmarc(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: query plan not optimal |
Date: | 2013-12-19 19:00:16 |
Message-ID: | 52B34240.4020905@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 19/12/2013 19:33, Jeff Janes wrote:
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.56..4001768.10 rows=479020 width=26) (actual
> time=2.303..15371.237 rows=479020 loops=1)
> Output: path.pathid, batch.filename
> Buffers: shared hit=2403958 read=7539
> -> Seq Scan on public.batch (cost=0.00..11727.20 rows=479020
> width=85) (actual time=0.340..160.142 rows=479020 loops=1)
> Output: batch.path, batch.filename
> Buffers: shared read=6937
> -> Index Scan using idx_path on public.path (cost=0.56..8.32
> rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=479020)
> Output: path.pathid, path.path
> Index Cond: (path.path = batch.path)
> Buffers: shared hit=2403958 read=602
> Total runtime: 15439.043 ms
>
>
> As you can see, more than twice as fast, and a very high hit ratio
> on the path table, even if we start from a cold cache (I did, here,
> both PostgreSQL and OS). We have an excellent hit ratio because the
> batch table contains few different path (several files in a
> directory), and is already quite clustered, as it comes from a
> backup, which is of course performed directory by directory.
>
>
> What is your effective_cache_size set to?
>
> Cheers,
>
> Jeff
Yeah, I had forgotten to set it up correctly on this test environment
(its value is correctly set in production environments). Putting it to a
few gigabytes here gives me this cost:
bacula=# explain select pathid, filename from batch join path using (path);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.56..2083904.10 rows=479020 width=26)
-> Seq Scan on batch (cost=0.00..11727.20 rows=479020 width=85)
-> Index Scan using idx_path on path (cost=0.56..4.32 rows=1 width=16)
Index Cond: (path = batch.path)
(4 lignes)
It still chooses the hash join though, but by a smaller margin.
And it still only will access a very small part of path (always the same
5000 records) during the query, which isn't accounted for in the cost if
I understand correctly ?
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2013-12-19 20:14:16 | Re: slow query - will CLUSTER help? |
Previous Message | Jeff Janes | 2013-12-19 18:33:04 | Re: query plan not optimal |