From: | David Haynes II <dahaynes(at)umn(dot)edu> |
---|---|
To: | PT <wmoran(at)potentialtech(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Fwd: Raster performance |
Date: | 2015-05-28 15:06:24 |
Message-ID: | CAKNXh=os2wHF__4s=ZXotq7HzoL5Fut=2d+7xB+_UVVMWoCXFQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Sorry,
The query run times are significantly slower on outdb as that using indb
here are the run times on 2 queries.
ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
Counties)
OutDB: 873.564s (14 minutes 33s) InDB: 127.36s (2 minutes 7s)
ST_Count(select single band here)/ST_Clip(on all bands)/Inner
Join/ST_Transform (US Counties)
OutDB: 9537.371s (2 hours 38minutes) InDB: 310s (5 minutes 10 seconds)
In the query planner it shows a large change in the number of columns
(width) that are picked up in the CTE_rast_select.
These extra columns slow down the ability to process the data.
OUT DB
CTE rast_select
> -> Nested Loop (cost=0.28..76131.41 rows=62033 *width=1086)*
> -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
In DB
Nested Loop (cost=0.28..51767.41 rows=62033 *width=272*)
> -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
On Wed, May 27, 2015 at 4:31 PM, PT <wmoran(at)potentialtech(dot)com> wrote:
> On Tue, 26 May 2015 12:52:24 -0500
> David Haynes II <dahaynes(at)umn(dot)edu> wrote:
>
> > Hello,
> >
> > I have a question about the query optimizer and its performance on
> spatial
> > datasets, specifically rasters. My use case is rather unique, the
> > application that I am developing allows users to request summarizations
> of
> > various geographic boundaries around the world. Therefore our raster
> > datasets are global. We are in the process of conducting some benchmarks
> > for our system and we noticed something unexpected.
> >
> > The query is the same except the first is run on a raster (46gigs) in out
> > of database (outdb) and the second is the same raster (46gigs) stored in
> > database (indb). The raster is multibanded (13), with each band
> > representing one entire MODIS global scene. A single year of MODIS is
> > approximately 3.6 gigs.
> >
> > The outdb is being out performed by indb, because the query optimizer
> gets
> > smarter. But what is also interesting is all the extra pieces that are
> > brought in with outdb.
> >
> > with poly as
> > ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom FROM
> > us_counties )
> > , rast_select as
> > ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
> > rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast,
> s.geom) )
> > select r.id, r.name, ST_Count(r.rast, 1, True)
> >
> >
> > QUERY PLAN With Outdb
> >
> --------------------------------------------------------------------------------------------------
> > Sort (cost=93911.29..93926.80 rows=6204 width=254)
> > Sort Key: r.id, r.name
> > CTE poly
> > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109
> width=62247)
> > CTE rast_select
> > -> Nested Loop (cost=0.28..76131.41 rows=62033 width=1086)
> > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
> > -> Index Scan using modis_rast_gist on modis r_1
> > (cost=0.28..24.40 rows=2 width=836)
> > Index Cond: ((rast)::geometry && s.geom)
> > Filter: _st_intersects(s.geom, rast, NULL::integer)
> > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254)
> > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> > width=254)
> >
> > QUERY PLAN With Indb
> >
> >
> -------------------------------------------------------------------------------------------------------------
> > Sort (cost=69547.29..69562.80 rows=6204 width=254)
> > Sort Key: r.id, r.name
> > CTE poly
> > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109
> width=62247)
> > CTE rast_select
> > -> Nested Loop (cost=0.28..51767.41 rows=62033 width=272)
> > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
> > -> Index Scan using modis_noout_rast_gist on modis_noout r_1
> > (cost=0.28..16.56 rows=2 width=22)
> > Index Cond: ((rast)::geometry && s.geom)
> > Filter: _st_intersects(s.geom, rast, NULL::integer)
> > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254)
> > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> > width=254)
>
> I could be missing something here, but I don't see how long the queries
> actually take to run. Have you actually run the queries and timed them?
> Keep in mind that analyze does not actually run the query, it only plans
> it, so the actual run time is unknown if all you do is analyze.
>
> The query plans appear to be equal, assuming there are slight variances
> in the names of tables from one DB to another (and I assume that your
> description of indb and outdb reflects the fact that there are (for
> reasons unknown) two copies of the data).
>
> The only purpose to those estimates is to choose a good plan. If the
> plan is bad for one database and both databases have the same data, then
> the plan will be bad for both.
>
> Since there have been no other responses, I'm guessing that others are
> confused by your question as well. Can you describe the actual problem
> that you're seeing?
>
> --
> Bill Moran <wmoran(at)potentialtech(dot)com>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
--
David Haynes, Ph.D.
Research Associate Terra Populus
Minnesota Population Center
www.terrapop.org
From | Date | Subject | |
---|---|---|---|
Next Message | Ravi Krishna | 2015-05-28 15:15:22 | Partitioning and performance |
Previous Message | Robert Haas | 2015-05-28 12:51:04 | Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1 |