Re: Gist indexing performance with cidr types

From: Henrik Thostrup Jensen <htj(at)nordu(dot)net>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Gist indexing performance with cidr types
Date: 2015-08-26 11:29:01
Message-ID: alpine.DEB.2.11.1508261245250.35330@pyrite
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 26 Aug 2015, Emre Hasegeli wrote:

>> Are the coverage operatons just that expensive?
>
> They shouldn't be. A similar query like yours works in 0.5 second on my laptop:
[snip]

I get the same from your testcase.

> Maybe, something we haven't expected about your dataset causes a
> performance regression on the index. Did you see anything relevant on
> the server logs on index creation time?

I tried dropping and re-creating the index. The only log entry was for the
drop statement.

The distribution of the data is not uniform like the data set you produce.
Though I find it hard to believe that it would affect this as much.

select masklen(route), count(*) from routes group by masklen(route);

masklen | count
---------+---------
8 | 47
9 | 30
10 | 84
11 | 225
12 | 580
13 | 1163
14 | 2401
15 | 4530
16 | 32253
17 | 20350
18 | 35583
19 | 76307
20 | 111913
21 | 132807
22 | 229578
23 | 286986
24 | 1149793

Rest is rather small, though with bumps at /32 and /48 (typical IPv6 prefix length).

Real-world address space is very fragmented, where as some is unused.

Then there is the mixed IPv6 and IPv4 data that might factor in.

I tried the approach from your benchmark, to try make a more isolated test
case:

irrexplorer=> SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn = 2914;
SELECT 732

irrexplorer=> explain analyze select routes.route from routes join hmm on routes.route && hmm.route;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..511914.27 rows=2558 width=7) (actual time=8.096..17209.778 rows=8127 loops=1)
-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual time=0.010..0.609 rows=732 loops=1)
-> Index Only Scan using route_gist on routes (cost=0.41..470.32 rows=22900 width=7) (actual time=4.823..23.502 rows=11 loops=732)
Index Cond: (route && (hmm.route)::inet)
Heap Fetches: 0
Planning time: 0.971 ms
Execution time: 17210.627 ms
(7 rows)

The only difference in the query plan is that the above used an index
only, where as your test case used index scan (it did this for me as
well). I tried without index only scan:

irrexplorer=> set enable_indexonlyscan =false;
SET

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.41..571654.27 rows=2558 width=7) (actual time=6.406..15899.791 rows=8127 loops=1)
-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual time=0.011..0.615 rows=732 loops=1)
-> Index Scan using route_gist on routes (cost=0.41..551.93 rows=22900 width=7) (actual time=4.490..21.712 rows=11 loops=732)
Index Cond: ((route)::inet && (hmm.route)::inet)
Planning time: 0.505 ms
Execution time: 15900.669 ms
(6 rows)

Slight faster, but nothing significant. Something seems wonky.

Best regards, Henrik

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Emre Hasegeli 2015-08-26 13:46:07 Re: Gist indexing performance with cidr types
Previous Message Emre Hasegeli 2015-08-26 09:47:13 Re: Gist indexing performance with cidr types