Re: query plan not optimal

From: Marc Cousin <cousinmarc(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query plan not optimal
Date: 2013-12-19 19:00:16
Message-ID: 52B34240.4020905@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 19/12/2013 19:33, Jeff Janes wrote:
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.56..4001768.10 rows=479020 width=26) (actual
> time=2.303..15371.237 rows=479020 loops=1)
> Output: path.pathid, batch.filename
> Buffers: shared hit=2403958 read=7539
> -> Seq Scan on public.batch (cost=0.00..11727.20 rows=479020
> width=85) (actual time=0.340..160.142 rows=479020 loops=1)
> Output: batch.path, batch.filename
> Buffers: shared read=6937
> -> Index Scan using idx_path on public.path (cost=0.56..8.32
> rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=479020)
> Output: path.pathid, path.path
> Index Cond: (path.path = batch.path)
> Buffers: shared hit=2403958 read=602
> Total runtime: 15439.043 ms
>
>
> As you can see, more than twice as fast, and a very high hit ratio
> on the path table, even if we start from a cold cache (I did, here,
> both PostgreSQL and OS). We have an excellent hit ratio because the
> batch table contains few different path (several files in a
> directory), and is already quite clustered, as it comes from a
> backup, which is of course performed directory by directory.
>
>
> What is your effective_cache_size set to?
>
> Cheers,
>
> Jeff
Yeah, I had forgotten to set it up correctly on this test environment
(its value is correctly set in production environments). Putting it to a
few gigabytes here gives me this cost:

bacula=# explain select pathid, filename from batch join path using (path);
QUERY PLAN
----------------------------------------------------------------------------
Nested Loop (cost=0.56..2083904.10 rows=479020 width=26)
-> Seq Scan on batch (cost=0.00..11727.20 rows=479020 width=85)
-> Index Scan using idx_path on path (cost=0.56..4.32 rows=1 width=16)
Index Cond: (path = batch.path)
(4 lignes)

It still chooses the hash join though, but by a smaller margin.

And it still only will access a very small part of path (always the same
5000 records) during the query, which isn't accounted for in the cost if
I understand correctly ?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2013-12-19 20:14:16 Re: slow query - will CLUSTER help?
Previous Message Jeff Janes 2013-12-19 18:33:04 Re: query plan not optimal