Quick Links

Query Performance

From:	Diego Vargas <diegov(at)propaas(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Query Performance
Date:	2017-02-20 21:39:52
Message-ID:	CALYPHJfJ4MEvQO939ruv8mXyNBe1GKO2vs0yPrB=7a+ZDV1YcQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi All,

I'm having some trouble improving the timing of a set of queries to a
partitioned table.
Basically, I'm trying to find an index that would be used instead of a
bitmap heap scan by when the data is taken from disk. Or in any case,
something that would make the process of retrieving the data from disk
faster.

I'm running on CentOS 6.8, and all the tests are being done through psql.

The child tables exists based on the protocol column. Now, each partition
looks like this:

Now, I have this query that I've executed with a clean cache:
lportal=# explain analyze SELECT org_name, site_name, latitude, longitude,
elevation, measured_on, data FROM data_jsons_partition where protocol in
('aerosols','precipitations') and site_id in (... around 1000 site_id-s
...) and (measured_on >= '2013-09-24' and measured_on <= '2016-10-10')
order by org_name, site_name, measured_on limit 1000000;

And I get the following:
Limit (cost=149414.00..149518.52 rows=41806 width=110) (actual
time=25827.893..26012.065 rows=126543 loops=1)
-> Sort (cost=149414.00..149518.52 rows=41806 width=110) (actual
time=25827.889..25970.671 rows=126543 loops=1)
Sort Key: data_jsons_partition.org_name,
data_jsons_partition.site_name,
data_jsons_partition.measured_on
Sort Method: external merge Disk: 70616kB
-> Result (cost=0.00..146205.09 rows=41806 width=110) (actual
time=38.533..20810.204 rows=126543 loops=1)
-> Append (cost=0.00..146205.09 rows=41806 width=110)
(actual time=38.530..20739.245 rows=126543 loops=1)
-> Seq Scan on data_jsons_partition (cost=0.00..0.00
rows=1 width=608) (actual time=0.002..0.002 rows=0 loops=1)
Filter: ((protocol = ANY
('{partition_a,partition_b}'::text[])) AND (measured_on >=
'2013-09-24'::date) AND (measured_on <= '2016-10-10'::date) AND (site_id =
ANY ('{... 1000 site_id-s ...}'::integer[])))
-> Bitmap Heap Scan on partition_a_data_jsons_part
data_jsons_partition (cost=70.92..5209.38 rows=2132 width=114) (actual
time=38.526..812.397 rows=3017 loops=1)
Recheck Cond: ((measured_on >=
'2013-09-24'::date) AND (measured_on <= '2016-10-10'::date))
Filter: ((protocol = ANY ('{partition_a,
partition_b}'::text[])) AND (site_id = ANY ('{ ... }'::integer[])))
-> Bitmap Index Scan on partition_a
_data_jsons_part_mo (cost=0.00..70.39 rows=3014 width=0) (actual
time=2.974..2.974 rows=3017 loops=1)
Index Cond: ((measured_on >=
'2013-09-24'::date) AND (measured_on <= '2016-10-10'::date))
-> Bitmap Heap Scan on partition_b_data_jsons_part
data_jsons_partition (cost=4582.19..140995.72 rows=39673 width=110)
(actual time=738.486..19871.141 rows=123526 loops=1)
Recheck Cond: ((site_id = ANY ('{...
...}'::integer[])))
Filter: (protocol = ANY ('{partition_a,
partition_b}'::text[]))
-> Bitmap Index Scan on partition_b
_data_jsons_part_sm (cost=0.00..4572.27 rows=39673 width=0) (actual
time=715.684..715.684 rows=123526 loops=1)
Index Cond: ((site_id = ANY ('{...
...}'::integer[])))
Total runtime: 26049.062 ms

From this I've increased the effective_io_concurrency to 150 (since most of
the time was on fetching the data from the partition_b_data_jsons_part in
the second bitmap heap scan) and the work_mem to 1.5GB (for the sorting
that's being spilled on disk), improving the timing to 7 seconds (from
which 5-6 seconds comes from the sorting).

Now, this is a relative fast query. Some other doesn't specify the
protocol, and therefore goes over all the children tables. Those queries
takes around 5 minutes (without changes mentioned above) and around 1.5min
with the changes. Doing an explain analyze on those queries I see some of
the tables uses index scans (much slower than bitmap scan since there's
nothing on cache) and other the bitmap scans.

Is there a way to make it faster?

Thank you in advance.

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Mike Beaton	2017-02-21 12:36:36	Re: Correct use of cursors for very large result sets in Postgres
Previous Message	Mike Beaton	2017-02-19 07:54:18	Re: Correct use of cursors for very large result sets in Postgres