Re: Normal case or bad query plan?

From: Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Normal case or bad query plan?
Date: 2004-10-12 16:56:53
Message-ID: p1Uad.14359$Ia5.5691@edtnps89
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

This may sound more elaborate than it's worth, but I don't know of
a better way to avoid a table scan.

You want to index on a computed value that is a common prefix of your
FROM and TO fields.

The next step is to search on a fixed SET of prefixes of different
lengths. For example, some of your ranges might be common in the first 3
bytes of ipaddr, some in two, some in only one.

You create and index on one common prefix of either 1,2 or 3 bytes, for
each row.

Your query then looks something like (pardon my ignorance in PGSQL)

select *
from ip2location
where ip2prefix in (
network(:myaddr || '/8'),
network(:myaddr || '/16'),
network(:myaddr || '/24'),
:myaddr --- assuming single-address ranges are possible
)
and :myaddr between ip_address_from and ip_address_to

Although this looks a little gross, it hits very few records.
It also adapts cleanly to a join between ip2location and a table of
ip addrs.

Gabriele Bartolini wrote:
> Hi guys,
>
> please consider this scenario. I have this table:
>
> CREATE TABLE ip2location (
> ip_address_from BIGINT NOT NULL,
> ip_address_to BIGINT NOT NULL,
> id_location BIGINT NOT NULL,
> PRIMARY KEY (ip_address_from, ip_address_to)
> );
>
> I created a cluster on its primary key, by running:
> CLUSTER ip2location_ip_address_from_key ON ip2location;
>
> This allowed me to organise data in a more efficient way: the data that
> is contained are ranges of IP addresses with empty intersections; for
> every IP class there is a related location's ID. The total number of
> entries is 1392443.
>
> For every IP address I have, an application retrieves the corresponding
> location's id from the above table, by running a query like:
>
> SELECT id_location FROM ip2location WHERE '11020000111' >=
> ip_address_from AND '11020000111' <= ip_address_to;
>
> For instance, by running the 'EXPLAIN ANALYSE' command, I get this
> "funny" result:
>
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>
> Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8)
> (actual time=5338.120..40237.283 rows=1 loops=1)
> Filter: ((1040878301::bigint >= ip_address_from) AND
> (1040878301::bigint <= ip_address_to))
> Total runtime: 40237.424 ms
>
>
> With other data, that returns an empty set, I get:
>
> explain SELECT id_location FROM ip2location WHERE '11020000111' >=
> ip_address_from AND '11020000111' <= ip_address_to;
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>
> Index Scan using ip2location_ip_address_from_key on ip2location
> (cost=0.00..419.16 rows=140 width=8)
> Index Cond: ((11020000111::bigint >= ip_address_from) AND
> (11020000111::bigint <= ip_address_to))
>
>
> I guess the planner chooses the best of the available options for the
> first case, the sequential scan. This is not confirmed though by the
> fact that, after I ran "SET enable_scan TO off", I got this:
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Index Scan using ip2location_ip_address_from_key on ip2location
> (cost=0.00..31505.73 rows=124781 width=8) (actual
> time=2780.172..2780.185 rows=1 loops=1)
> Index Cond: ((1040878301::bigint >= ip_address_from) AND
> (1040878301::bigint <= ip_address_to))
> Total runtime: 2780.359 ms
>
>
> Is this a normal case or should I worry? What am I missing? Do you have
> any suggestion or comment to do (that would be extremely appreciated)?
> Is the CLUSTER I created worthwhile or not?
>
> Thank you,
> -Gabriele
>
> --
> Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member,
> ht://Check maintainer
> Current Location: Prato, Toscana, Italia
> angusgb(at)tin(dot)it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy,
> The Inferno
>
>
> ------------------------------------------------------------------------
>
>
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com)
> Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Browne 2004-10-12 20:02:53 Re: Which plattform do you recommend I run PostgreSQL for best performance?
Previous Message Francisco Reyes 2004-10-12 16:27:14 Re: Which plattform do you recommend I run PostgreSQL for