Re: min/max performance inequality.

From: Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: min/max performance inequality.
Date: 2015-01-08 00:00:07
Message-ID: CAMnJ+BdFqTGxBOc0OQ0MEL82q0yhkUmM2wE301xsDFLMgG31NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Jeff (and Tom)

On Wed, Jan 7, 2015 at 3:34 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Jan 7, 2015 at 3:00 PM, Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>
> wrote:
>
>> Hi.
>>
>> I was wondering how come there is such a drastic difference between
>> finding max and min. Seems like "index scan backwards" is really bad... The
>> table is freshly re-indexed just in case. I added a count(*) in there,
>> forcing the seq scan, and it's even better than the backwards index scan...
>>
>> db=> EXPLAIN ANALYZE select min(rowdate) from r_agrio where blockid =
>> 4814;
>>
>
> It crawls the data in rowdate order (either forward or reverse) until it
> finds the first 4814. Crawling forward it finds 4814 very early. Crawling
> backwards it has to pass through a bunch of non-4814 before it finds the
> first 4814.
>
> This fact doesn't show up in your EXPLAIN ANALYZE, but if you used a more
> modern version of postgresql (9.2 or above) there would be another line for
> "Rows Removed by Filter:" which would tell the story of what is going on.
>

Yeah, there is 10x more rows on when going backwards

>
> If you have a composite index on (blockid, rowdate), it would help make
> this much faster, as it can go directly to the desired row.
>

That does help a lot. So, when does postgres use a more-dimensional index,
even if not all dimensions are engaged (as there is an index that involves
those 2 fields, and more)? I definitely see it do that in some cases...

Even with that index, however, there is still a good difference in time
(the interest is theoretical at this point, as I found a better way to
extract that data anyway).

On a newer db.

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=521.54..521.55 rows=1 width=0) (actual time=39.770..39.770
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.55..521.54 rows=1 width=13) (actual
time=39.765..39.766 rows=1 loops=1)
-> Index Scan using rowdate_r_agrio on r_agrio
(cost=0.55..303738.47 rows=583 width=13) (actual time=39.763..39.763
rows=1 loops=1)
Index Cond: ((rowdate)::text IS NOT NULL)
Filter: (blockid = 4814::numeric)
Rows Removed by Filter: 37246
Total runtime: 39.798 ms
(8 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=521.54..521.55 rows=1 width=0) (actual
time=1497.377..1497.378 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.55..521.54 rows=1 width=13) (actual
time=1497.371..1497.372 rows=1 loops=1)
-> Index Scan Backward using rowdate_r_agrio on r_agrio
(cost=0.55..303738.47 rows=583 width=13) (actual time=1497.370..1497.370
rows=1 loops=1)
Index Cond: ((rowdate)::text IS NOT NULL)
Filter: (blockid = 4814::numeric)
Rows Removed by Filter: 317739
Total runtime: 1497.407 ms
(8 rows)
db=> CREATE INDEX concurrently xxx on r_agrio(rowdate,blockid);
CREATE INDEX

db=> EXPLAIN analyze select min(rowdate) from r_agrio where blockid = 4814;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
Result (cost=85.05..85.06 rows=1 width=0) (actual time=17.585..17.585
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..85.05 rows=1 width=13) (actual
time=17.580..17.581 rows=1 loops=1)
-> Index Only Scan using xxx on r_agrio (cost=0.43..37827.09
rows=447 width=13) (actual time=17.578..17.578 rows=1 loops=1)
Index Cond: ((rowdate IS NOT NULL) AND (blockid =
4814::numeric))
Heap Fetches: 0
Total runtime: 17.616 ms
(7 rows)

db=> EXPLAIN analyze select max(rowdate) from r_agrio where blockid = 4814;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=85.04..85.05 rows=1 width=0) (actual time=89.141..89.142
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.43..85.04 rows=1 width=13) (actual
time=89.135..89.136 rows=1 loops=1)
-> Index Only Scan Backward using xxx on r_agrio
(cost=0.43..37823.09 rows=447 width=13) (actual time=89.134..89.134 rows=1
loops=1)
Index Cond: ((rowdate IS NOT NULL) AND (blockid =
4814::numeric))
Heap Fetches: 1
Total runtime: 89.173 ms
(7 rows)

>
> Cheers,
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2015-01-08 00:19:20 Re: How to exclude building/installing contrib modules on Windows
Previous Message Tom Lane 2015-01-07 23:35:51 Re: min/max performance inequality.