Re: Gist indexing performance with cidr types

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Henrik Thostrup Jensen <htj(at)nordu(dot)net>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Gist indexing performance with cidr types
Date: 2015-08-26 16:08:01
Message-ID: CAMkU=1x=puaY9M-xk6=9TZQbWe6EBSMDQFVGNPHGRrM9f_Gc7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Aug 26, 2015 at 4:29 AM, Henrik Thostrup Jensen <htj(at)nordu(dot)net>
wrote:

> On Wed, 26 Aug 2015, Emre Hasegeli wrote:
>
> Are the coverage operatons just that expensive?
>>>
>>
>> They shouldn't be. A similar query like yours works in 0.5 second on my
>> laptop:
>>
> [snip]
>
> I get the same from your testcase.
>
> Maybe, something we haven't expected about your dataset causes a
>> performance regression on the index. Did you see anything relevant on
>> the server logs on index creation time?
>>
>
> I tried dropping and re-creating the index. The only log entry was for the
> drop statement.
>
> The distribution of the data is not uniform like the data set you produce.
> Though I find it hard to believe that it would affect this as much.
>
> select masklen(route), count(*) from routes group by masklen(route);
>

Any chance you can share the actual underlying data? I noticed it wasn't
on github, but is that because it is proprietary, or just because you don't
think it is interesting?

> irrexplorer=> explain analyze select routes.route from routes join hmm on
> routes.route && hmm.route;
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.41..511914.27 rows=2558 width=7) (actual
> time=8.096..17209.778 rows=8127 loops=1)
> -> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual
> time=0.010..0.609 rows=732 loops=1)
> -> Index Only Scan using route_gist on routes (cost=0.41..470.32
> rows=22900 width=7) (actual time=4.823..23.502 rows=11 loops=732)
> Index Cond: (route && (hmm.route)::inet)
> Heap Fetches: 0
> Planning time: 0.971 ms
> Execution time: 17210.627 ms
> (7 rows)
>

If you loop over the 732 rows yourself, issuing the simple query against
each retrieved constant value:

explain (analyze,buffers) select routes.route from routes where route && $1

Does each one take about the same amount of time, or are there some outlier
values which take much more time than the others?

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tory M Blue 2015-08-26 19:14:04 Index creation running now for 14 hours
Previous Message Emre Hasegeli 2015-08-26 13:46:07 Re: Gist indexing performance with cidr types