Re: Fwd: Raster performance

From: PT <wmoran(at)potentialtech(dot)com>
To: David Haynes II <dahaynes(at)umn(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fwd: Raster performance
Date: 2015-05-29 20:44:35
Message-ID: 20150529164435.e218889770d67346ecf72e38@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 29 May 2015 13:37:24 -0500
David Haynes II <dahaynes(at)umn(dot)edu> wrote:
>
> Let me explain this a bit more clearly. The dataset (table) that we are
> using in this analysis is produced from a satellite image (tif) called
> MODIS. This image can be placed into PostgreSQL using the raster2pgsql
> command (http://postgis.net/docs/using_raster_dataman.html). The
> raster2pgsql command reads the tif image and creates a table that can be
> used with the PostGIS extensions. There is no difference between the
> modis_noout and modis, what has happened is that we have used the
> raster2pgsl -R flag in the conversion process. The -R flag creates metadata
> for the raster file. For example, the modis dataset as tif is 46 GB, the
> same dataset stored as table within the database is 1.74 GB and the using
> outdb the metadata table is 52 MB.

What is the output of:
SELECT pg_total_relation_size('modis') AS "modis",
pg_total_relation_size('modis_noout') AS "modis_noout";
on both indb and outdb?

> We have seen a decrease in performance time when using OutDB, keep in mind
> that the schema and indices on the tables are exactly the same.
>
> CREATE TABLE modis (
> rid integer NOT NULL,
> rast public.raster,
> filename text,
> CONSTRAINT enforce_height_rast CHECK ((public.st_height(rast) = 250)),
> CONSTRAINT enforce_nodata_values_rast CHECK
> (((public._raster_constraint_nodata_values(rast))::numeric(16,10)[] =
> '{NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL}'::numeric(16,10)[])),
> CONSTRAINT enforce_num_bands_rast CHECK ((public.st_numbands(rast) =
> 12)),
> CONSTRAINT enforce_out_db_rast CHECK
> ((public._raster_constraint_out_db(rast) =
> '{t,t,t,t,t,t,t,t,t,t,t,t}'::boolean[])),
> CONSTRAINT enforce_pixel_types_rast CHECK
> ((public._raster_constraint_pixel_types(rast) =
> '{8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI,8BUI}'::text[])),
> CONSTRAINT enforce_same_alignment_rast CHECK
> (public.st_samealignment(rast,
> '01000000002EC50BE300F57C4026C50BE300F57CC0E7FBA955801673C13202AA558016634100000000000000000000000000000000BA1A000001000100'::public.raster)
> ),
> CONSTRAINT enforce_scalex_rast CHECK
> (((public.st_scalex(rast))::numeric(16,10) =
> 463.312716527917::numeric(16,10))),
> CONSTRAINT enforce_scaley_rast CHECK
> (((public.st_scaley(rast))::numeric(16,10) =
> (-463.312716527917)::numeric(16,10))),
> CONSTRAINT enforce_srid_rast CHECK ((public.st_srid(rast) = 6842)),
> CONSTRAINT enforce_width_rast CHECK ((public.st_width(rast) = 250))
> );
> CREATE SEQUENCE modis_rid_seq
> START WITH 1
> INCREMENT BY 1
> NO MINVALUE
> NO MAXVALUE
> CACHE 1;
> ALTER TABLE ONLY modis ALTER COLUMN rid SET DEFAULT
> nextval('modis_rid_seq'::regclass);
> ALTER TABLE ONLY modis ADD CONSTRAINT modis_pkey PRIMARY KEY (rid);
> CREATE INDEX modis_rast_gist ON modis USING gist
> (public.st_convexhull(rast));
>
>
> Here is the explain analyze
>
> InDB
> Sort (cost=69547.29..69562.80 rows=6204 width=254) (actual
> time=131042.478..131042.877 rows=7612 loops=1)
> Sort Key: r.id, r.name
> Sort Method: quicksort Memory: 787kB
> CTE poly
> -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 width=62247)
> (actual time=3.008..8135.679 rows=3109 loops=1)
> CTE rast_select
> -> Nested Loop (cost=0.28..51767.41 rows=62033 width=272) (actual
> time=170.799..128606.266 rows=7677 loops=1)
> -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
> (actual time=3.058..8212.030 rows=3109 loops=1)
> -> Index Scan using modis_noout_rast_gist on modis_noout r_1
> (cost=0.28..16.56 rows=2 width=22) (actual time=4.329..11.244 rows=2
> loops=3109)
> Index Cond: ((rast)::geometry && s.geom)
> Filter: _st_intersects(s.geom, rast, NULL::integer)
> Rows Removed by Filter: 0
> -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) (actual
> time=131033.902..131036.105 rows=7612 loops=1)
> -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> width=254) (actual time=170.814..128659.842 rows=7677 loops=1)
> Total runtime: 131414.752 ms
>
> OutDB
> Sort (cost=93911.29..93926.80 rows=6204 width=254) (actual
> time=866326.762..866327.148 rows=7612 loops=1)
> Sort Key: r.id, r.name
> Sort Method: quicksort Memory: 787kB
> CTE poly
> -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109 width=62247)
> (actual time=1.327..6954.020 rows=3109 loops=1)
> CTE rast_select
> -> Nested Loop (cost=0.28..76131.41 rows=62033 width=1086) (actual
> time=257.610..863474.778 rows=7677 loops=1)
> -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
> (actual time=1.341..7030.138 rows=3109 loops=1)
> -> Index Scan using modis_rast_gist on modis r_1
> (cost=0.28..24.40 rows=2 width=836) (actual time=1.481..3.952 rows=2
> loops=3109)
> Index Cond: ((rast)::geometry && s.geom)
> Filter: _st_intersects(s.geom, rast, NULL::integer)
> Rows Removed by Filter: 0
> -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254) (actual
> time=866317.923..866320.316 rows=7612 loops=1)
> -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> width=254) (actual time=257.625..863555.082 rows=7677 loops=1)
> Total runtime: 866691.113 ms
>
> On Fri, May 29, 2015 at 6:40 AM, PT <wmoran(at)potentialtech(dot)com> wrote:
>
> > On Thu, 28 May 2015 10:06:24 -0500
> > David Haynes II <dahaynes(at)umn(dot)edu> wrote:
> > >
> > > The query run times are significantly slower on outdb as that using indb
> > > here are the run times on 2 queries.
> > >
> > > ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
> > > Counties)
> > > OutDB: 873.564s (14 minutes 33s) InDB: 127.36s (2 minutes 7s)
> > >
> > > ST_Count(select single band here)/ST_Clip(on all bands)/Inner
> > > Join/ST_Transform (US Counties)
> > > OutDB: 9537.371s (2 hours 38minutes) InDB: 310s (5 minutes 10 seconds)
> > >
> > > In the query planner it shows a large change in the number of columns
> > > (width) that are picked up in the CTE_rast_select.
> > > These extra columns slow down the ability to process the data.
> >
> > As I mentioned before, that's not how that works. Those are estimates. If
> > those rows are actually different between the two tables, then your data
> > is different between the two databases and you can't expect the performance
> > to be the same.
> >
> > Additionally, the part you snip out below isn't the part that's different.
> > In particular, the difference is coming from the fact that one of the
> > plans uses modis and the other uses modis_noout.
> >
> > Does modis exist in indb? Does modis_noout exist on outdb? What is the
> > difference between these two tables? Because _that_ is where the time
> > difference is most likely happening (based on the explain output).
> >
> > Additionally, run EXPLAIN ANALYZE on these queries to get the actual
> > times in addition to the estimates. Furthermore, given that a lot of
> > the confusion in this question is due to a lack of iformation, it would
> > be a good idea to include the table definitions.
> >
> > > OUT DB
> > > CTE rast_select
> > > > -> Nested Loop (cost=0.28..76131.41 rows=62033 *width=1086)*
> > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109
> > width=250)
> > >
> > >
> > > In DB
> > > Nested Loop (cost=0.28..51767.41 rows=62033 *width=272*)
> > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109
> > width=250)
> > >
> > > On Wed, May 27, 2015 at 4:31 PM, PT <wmoran(at)potentialtech(dot)com> wrote:
> > >
> > > > On Tue, 26 May 2015 12:52:24 -0500
> > > > David Haynes II <dahaynes(at)umn(dot)edu> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I have a question about the query optimizer and its performance on
> > > > spatial
> > > > > datasets, specifically rasters. My use case is rather unique, the
> > > > > application that I am developing allows users to request
> > summarizations
> > > > of
> > > > > various geographic boundaries around the world. Therefore our raster
> > > > > datasets are global. We are in the process of conducting some
> > benchmarks
> > > > > for our system and we noticed something unexpected.
> > > > >
> > > > > The query is the same except the first is run on a raster (46gigs)
> > in out
> > > > > of database (outdb) and the second is the same raster (46gigs)
> > stored in
> > > > > database (indb). The raster is multibanded (13), with each band
> > > > > representing one entire MODIS global scene. A single year of MODIS is
> > > > > approximately 3.6 gigs.
> > > > >
> > > > > The outdb is being out performed by indb, because the query optimizer
> > > > gets
> > > > > smarter. But what is also interesting is all the extra pieces that
> > are
> > > > > brought in with outdb.
> > > > >
> > > > > with poly as
> > > > > ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom FROM
> > > > > us_counties )
> > > > > , rast_select as
> > > > > ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
> > > > > rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast,
> > > > s.geom) )
> > > > > select r.id, r.name, ST_Count(r.rast, 1, True)
> > > > >
> > > > >
> > > > > QUERY PLAN With Outdb
> > > > >
> > > >
> > --------------------------------------------------------------------------------------------------
> > > > > Sort (cost=93911.29..93926.80 rows=6204 width=254)
> > > > > Sort Key: r.id, r.name
> > > > > CTE poly
> > > > > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109
> > > > width=62247)
> > > > > CTE rast_select
> > > > > -> Nested Loop (cost=0.28..76131.41 rows=62033 width=1086)
> > > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109
> > width=250)
> > > > > -> Index Scan using modis_rast_gist on modis r_1
> > > > > (cost=0.28..24.40 rows=2 width=836)
> > > > > Index Cond: ((rast)::geometry && s.geom)
> > > > > Filter: _st_intersects(s.geom, rast, NULL::integer)
> > > > > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254)
> > > > > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> > > > > width=254)
> > > > >
> > > > > QUERY PLAN With Indb
> > > > >
> > > > >
> > > >
> > -------------------------------------------------------------------------------------------------------------
> > > > > Sort (cost=69547.29..69562.80 rows=6204 width=254)
> > > > > Sort Key: r.id, r.name
> > > > > CTE poly
> > > > > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109
> > > > width=62247)
> > > > > CTE rast_select
> > > > > -> Nested Loop (cost=0.28..51767.41 rows=62033 width=272)
> > > > > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109
> > width=250)
> > > > > -> Index Scan using modis_noout_rast_gist on modis_noout
> > r_1
> > > > > (cost=0.28..16.56 rows=2 width=22)
> > > > > Index Cond: ((rast)::geometry && s.geom)
> > > > > Filter: _st_intersects(s.geom, rast, NULL::integer)
> > > > > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254)
> > > > > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> > > > > width=254)
> > > >
> > > > I could be missing something here, but I don't see how long the queries
> > > > actually take to run. Have you actually run the queries and timed them?
> > > > Keep in mind that analyze does not actually run the query, it only
> > plans
> > > > it, so the actual run time is unknown if all you do is analyze.
> > > >
> > > > The query plans appear to be equal, assuming there are slight variances
> > > > in the names of tables from one DB to another (and I assume that your
> > > > description of indb and outdb reflects the fact that there are (for
> > > > reasons unknown) two copies of the data).
> > > >
> > > > The only purpose to those estimates is to choose a good plan. If the
> > > > plan is bad for one database and both databases have the same data,
> > then
> > > > the plan will be bad for both.
> > > >
> > > > Since there have been no other responses, I'm guessing that others are
> > > > confused by your question as well. Can you describe the actual problem
> > > > that you're seeing?
> > > >
> > > > --
> > > > Bill Moran <wmoran(at)potentialtech(dot)com>
> > > >
> > > >
> > > > --
> > > > Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> > > > To make changes to your subscription:
> > > > http://www.postgresql.org/mailpref/pgsql-general
> > > >
> > >
> > >
> > >
> > > --
> > > David Haynes, Ph.D.
> > > Research Associate Terra Populus
> > > Minnesota Population Center
> > > www.terrapop.org
> >
> >
> > --
> > Bill Moran <wmoran(at)potentialtech(dot)com>
> >
>
>
>
> --
> David Haynes, Ph.D.
> Research Associate Terra Populus
> Minnesota Population Center
> www.terrapop.org

--
Bill Moran <wmoran(at)potentialtech(dot)com>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Begin 2015-05-29 20:56:14 Re: Planner cost adjustments
Previous Message PT 2015-05-29 20:34:34 Re: Planner cost adjustments