Highly Variable Planning Times

From: Michael Malis <michaelmalis2(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Highly Variable Planning Times
Date: 2017-04-19 20:39:40
Message-ID: CAFQtOwqWdDFZ-1Xv2=b43=ScNjsi5nOQ7MQyEz6TEXW1cf=3Pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been encountering highly variable planning time on PG 9.5.4.
Running `EXPLAIN SELECT * FROM events_1171738` will take anywhere from
200ms to 4s. This likely has to do with the table having 1300 partial
indexes on it (for reasons elaborated on in
https://blog.heapanalytics.com/running-10-million-postgresql-indexes-in-production/)
I'm trying to figure out if there is something I can do to eliminate
the slow planning from happening.

When I used `strace -c` on the backend process, I found that the
number of `semop` system calls is much higher for the slower queries
than the faster ones. After digging in a bit more I found that when
the query was slow, there would be a single semaphore which would make
up most of the `semop` calls. Here is a snippet of the output when I
ran `strace -r` on a query with high planning time:

0.000225 open("base/16384/13236647", O_RDWR) = 219
0.000110 lseek(219, 0, SEEK_END) = 16384
0.000122 semop(1736711, [{9, -1, 0}], 1) = 0
0.004110 semop(1769480, [{10, 1, 0}], 1) = 0
0.000197 semop(1736711, [{9, -1, 0}], 1) = 0
0.010096 semop(1540097, [{1, 1, 0}], 1) = 0
0.000176 semop(1736711, [{9, -1, 0}], 1) = 0
0.004069 semop(1638404, [{15, 1, 0}], 1) = 0
0.000244 open("base/16384/13236648", O_RDWR) = 220
0.000131 lseek(220, 0, SEEK_END) = 16384
0.000212 semop(1736711, [{9, -1, 0}], 1) = 0
0.007851 semop(1736711, [{8, 1, 0}], 1) = 0
0.000130 semop(1736711, [{9, -1, 0}], 1) = 0
0.000101 semop(1769480, [{5, 1, 0}], 1) = 0
0.000450 open("base/16384/13236653", O_RDWR) = 221
0.000151 lseek(221, 0, SEEK_END) = 180224
0.000141 semop(1736711, [{9, -1, 0}], 1) = 0
0.000239 semop(1736711, [{9, -1, 0}], 1) = 0
0.000419 semop(1736711, [{10, 1, 0}], 1) = 0
0.000127 semop(1736711, [{9, -1, 0}], 1) = 0
0.001493 semop(1703942, [{3, 1, 0}], 1) = 0

You can see that most of the semop calls are waits on semaphore #9 in
semaphore set 1736711. I looked up the relation for the files being
opened and they all seem to be indexes on the table. Based on some
flame graphs I generated, I believe most of the time is spent inside
of the function get_relation_info. I think the open/lseek calls are
coming from the estimate_rel_size call in get_relation_info, but I
haven't been able to figure out where the semop calls are coming from.

It seems that the semop calls are at least a symptom of the high
planning time. I'm looking for help in finding the root cause of the
high planning time and for figuring out a way for eliminating the high
planning time.

Thanks,
Michael

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-04-19 20:47:14 Re: Highly Variable Planning Times
Previous Message Merlin Moncure 2017-04-19 20:36:16 Re: Ongoing issues with representation of empty arrays