From: | "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Strange issue with GiST index scan taking far too long |
Date: | 2008-06-09 14:24:08 |
Message-ID: | 20080609142408.GA9409@a-kretschmer.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
am Mon, dem 09.06.2008, um 14:18:50 +0100 mailte Mark Cave-Ayland folgendes:
> Hi there,
>
> I'm currently looking at a bug report in PostGIS where we are getting
> extremely long index scan times querying an index in one case, but the
> same scan can take much less time if the input geometry is calculated as
> the result of another function.
>
> First of all, I include the EXPLAIN ANALYZE of the basic query which
> looks like this:
>
>
> postgis=# explain analyze select count(*) from geography where centroid
> && (select the_geom from geography where id=69495);
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual
> time=2691.783..2691.784 rows=1 loops=1)
> InitPlan
> -> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559)
> (actual time=60.987..61.373 rows=1 loops=1)
> Filter: (id = 69495::numeric)
> -> Index Scan using geography_geom_centroid_idx on geography
> (cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722
> rows=32880 loops=1)
> Index Cond: (centroid && $0)
> Filter: (centroid && $0)
> Total runtime: 2692.288 ms
> (8 rows)
>
There are a BIG difference between estimated rows and real rows (1 vs.
32880). Why? Do you have recent statistics?
>
> The only real thing to know about the query is that the id field within
> the geography table is a primary key, and hence only a single geometry
Sure? I can't believe this because the rows=32880.
> is being returned from within the subselect. Note that most of the time
> is disappearing into the index scan.
>
> Where things start getting really strange is when we add an extra
> function called force_2d() into the mix. All this function does is scan
> through the single geometry returned from the subselect and remove any
> 3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this
> query looks like this:
>
>
> postgis=# explain analyze select count(*) from geography where centroid
> && (select force_2d(the_geom) from geography where id=69495);
>
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual
> time=343.004..343.005 rows=1 loops=1)
> InitPlan
> -> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559)
> (actual time=48.714..49.016 rows=1 loops=1)
> Filter: (id = 69495::numeric)
> -> Index Scan using geography_geom_centroid_idx on geography
> (cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880
> loops=1)
> Index Cond: (centroid && $0)
> Filter: (centroid && $0)
> Total runtime: 343.084 ms
> (8 rows)
>
>
> So by adding in an extra function around the subselect result, we have
> speeded up the index lookup by several orders of magnitude, and the
Wild guess: you have a big cache.
But i'm not a PostGIS-Expert.
Regards, Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2008-06-09 14:34:17 | Re: Message-ID should surely not be shown as a mailto: URL |
Previous Message | Andrew Dunstan | 2008-06-09 14:00:29 | Re: pg_dump restore time and Foreign Keys |