Re: Looks like merge join planning time is too big, 55 seconds

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Sergey Burladyan <eshkinkot(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Looks like merge join planning time is too big, 55 seconds
Date: 2013-08-01 19:51:04
Message-ID: CAMkU=1zuaFU7-DBJGawX+P+e4HjXmL+R_sxOLoB4Q18B5fRPHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Aug 1, 2013 at 12:13 PM, Sergey Burladyan <eshkinkot(at)gmail(dot)com> wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>
>> I'd use strace to find what file handle is being read and written, and
>> lsof to figure out what file that is.
>
> I use strace, it is more read then write:
> $ cut -d '(' -f 1 /var/tmp/pg.trace | sort | uniq -c | sort -n
> 49 select
> 708 close
> 1021 open
> 7356 write
> 212744 read
> 219650 lseek

Based on your iotop (or whatever that was that you posted previously)
most of the reads must be coming from the file system cache.

>
> top reads:
> 7859 read(150 open("base/16444/17685.129", O_RDWR|O_CREAT, 0600) = 150
> 9513 read(149 open("base/16444/17685.128", O_RDWR|O_CREAT, 0600) = 149
> 10529 read(151 open("base/16444/17685.130", O_RDWR|O_CREAT, 0600) = 151
> 12155 read(152 open("base/16444/17685.131", O_RDWR|O_CREAT, 0600) = 152
> 12768 read(154 open("base/16444/17685.133", O_RDWR|O_CREAT, 0600) = 154
> 16210 read(153 open("base/16444/17685.132", O_RDWR|O_CREAT, 0600) = 153
>
> it is 'items' table:
> select relname from pg_class where relfilenode = 17685;
> relname
> ---------
> items
>
> each read is 8192 bytes, so for EXPLAIN query with two simple index scan, *without* ANALYZE postgres
> read (7859 + 9513 + 10529 + 12155 + 12768 + 16210) * 8192 = 565 526 528 bytes from it.
>
>> It looks like it is more write than read, which does seem strange.
>
> Why it read something for simple EXPLAIN, without real executing query? :-)

I figured it was reading some system catalogs or something. I don't
know why it would be reading the table files. Or writing much of
anything, either.

I think the next step would be to run gdb -p <pid> (but don't start
gdb until backend is in the middle of a slow explain), then:

break read
c
bt

Then repeat the c and bt combination a few more times, to build up a
dataset on what the call stack is which is causing the reads to
happen.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2013-08-01 21:15:09 Re: subselect requires offset 0 for good performance.
Previous Message Scott Marlowe 2013-08-01 19:40:41 subselect requires offset 0 for good performance.