Re: Gist indexing performance with cidr types

From: Henrik Thostrup Jensen <htj(at)nordu(dot)net>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: Gist indexing performance with cidr types
Date: 2015-08-28 14:05:14
Message-ID: alpine.DEB.2.11.1508281513050.14499@pyrite
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

On Thu, 27 Aug 2015, Emre Hasegeli wrote:

> I think the slowdown is not related with the key your searched for,
> but the organisation of the index. We have a simple structure for
> the index keys. Basically, common bits of the child nodes are stored
> on the parent node. It leads to not efficient indexes, where there
> are too much values with the same prefix. I couldn't quite understand
> why it performs so bad, though.

I can see the issue. Unfortunately IP space tends to be fragmented in some
ranges, and very sparse in other.

It is unfortunate that something to index IP prefixes doesn't handle BGP
and IRR data very well (the only largish "real" datasets with IP prefixes
I can think of).

> You might have better luck with ip4r extension [1] or creating an index
> using the range types like this:
[snip]

Using the range type index:

Nested Loop (cost=0.42..603902.92 rows=8396377 width=26) (actual time=0.514..662.500 rows=8047 loops=1)
-> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual time=0.015..0.119 rows=732 loops=1)
-> Index Scan using routes_cidr_to_range_idx on routes (cost=0.42..595.58 rows=22941 width=19) (actual time=0.262..0.903 rows=11 loops=732)
Index Cond: (inetrange(set_masklen((route)::inet, 0), set_masklen(broadcast((route)::inet), 0)) && inetrange(set_masklen((hmm.route)::inet, 0), set_masklen(broadcast((hmm.route)::inet), 0)))
Planning time: 0.211 ms
Execution time: 662.769 ms
(6 rows)

Boom. This is actually usefull.

It does take 70 seconds for the biggst network though. The index is also
rather large:

public | routes_cidr_to_range_idx | index | htj | routes | 158 MB |

Table is 119MB data. The gist index was 99 MB.

Best regards, Henrik

Henrik Thostrup Jensen <htj at nordu.net>
Software Developer, NORDUnet

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message 挨踢人 2015-08-31 13:51:54 is there any way we can push join predicate into inner table
Previous Message Jean Cavallo 2015-08-27 17:21:22 Server slowing down over time