Re: less than 2 sec for response - possible?

From: trafdev <trafdev(at)mail(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: less than 2 sec for response - possible?
Date: 2016-07-02 02:48:06
Message-ID: 2f7771c1-40f2-8c9b-7c0a-232935ff93ea@mail.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Tom.

I've created index on aid, date:

create index aaa on stats.feed_sub(aid,date);

and simplified a query (dropped gran as it's equal for all rows anyway):

SELECT
sum(stats.feed_sub.c_filt_click_cap_ip) AS clicks_from_ip,
sum(stats.feed_sub.c_filt_doubleclick) AS clicks_on_target,
sum(stats.feed_sub.c_filt_delay_clicks) AS ip_click_period,
sum(stats.feed_sub.c_filt_fast_click) AS fast_click,
sum(stats.feed_sub.c_filt_ip_mismatch) AS ip_mismatch,
sum(stats.feed_sub.c_filt_lng_mismatch) AS lng_mismatch,
sum(stats.feed_sub.c_filt_ref_mismatch) AS ref_mismatch,
sum(stats.feed_sub.c_filt_ua_mismatch) AS ua_mismatch,
sum(stats.feed_sub.c_filt_url_expired) AS url_expired,
stats.feed_sub.subid AS stats_feed_sub_subid,
stats.feed_sub.sid AS stats_feed_sub_sid
FROM stats.feed_sub
WHERE stats.feed_sub.date >= '2016-06-01' :: TIMESTAMP AND
stats.feed_sub.date <= '2016-06-30' :: TIMESTAMP AND
stats.feed_sub.aid = 3
GROUP BY
stats.feed_sub.subid, stats.feed_sub.sid;

All data is in the cache and it still takes almost 5 seconds to complete:

QUERY PLAN
HashAggregate (cost=792450.42..803727.24 rows=346979 width=86) (actual
time=4742.145..4882.468 rows=126533 loops=1)
" Group Key: subid, sid"
Buffers: shared hit=1350371
-> Index Scan using aaa on feed_sub (cost=0.43..697031.39
rows=3469783 width=86) (actual time=0.026..1655.394 rows=3588376 loops=1)
Index Cond: ((aid = 3) AND (date >= '2016-06-01
00:00:00'::timestamp without time zone) AND (date <= '2016-06-30
00:00:00'::timestamp without time zone))
Buffers: shared hit=1350371
Planning time: 0.159 ms
Execution time: 4899.934 ms

It's better, but still is far from "<2 secs" goal.

Any thoughts?

On 07/01/16 18:23, Tom Lane wrote:
> trafdev <trafdev(at)mail(dot)ru> writes:
>> CREATE INDEX ix_feed_sub_date
>> ON stats.feed_sub
>> USING brin
>> (date);
>
>> CREATE UNIQUE INDEX ixu_feed_sub
>> ON stats.feed_sub
>> USING btree
>> (date, gran, aid, pid, sid, fid, subid COLLATE pg_catalog."default");
>
>> HashAggregate (cost=901171.72..912354.97 rows=344100 width=86) (actual
>> time=7207.825..7335.473 rows=126044 loops=1)
>> " Group Key: subid, sid"
>> Buffers: shared hit=3635804
>> -> Index Scan using ixu_feed_sub on feed_sub (cost=0.56..806544.38
>> rows=3440994 width=86) (actual time=0.020..3650.208 rows=3578344 loops=1)
>> Index Cond: ((date >= '2016-06-01 00:00:00'::timestamp without
>> time zone) AND (date <= '2016-06-30 00:00:00'::timestamp without time
>> zone) AND (gran = '1 day'::interval) AND (aid = 3))
>> Buffers: shared hit=3635804
>> Planning time: 0.150 ms
>> Execution time: 7352.009 ms
>
> Neither of those indexes is terribly well designed for this query.
> A btree index on (aid, gran, date) or (gran, aid, date) would work
> much better. See
>
> https://www.postgresql.org/docs/9.5/static/indexes-multicolumn.html
>
> You could rearrange the column order in that giant unique index
> and get some of the benefit. But if you're desperate to optimize
> this particular query, an index not bearing so many irrelevant columns
> would probably be better for it.
>
> An alternative way of thinking would be to create an index with those
> three leading columns and then all of the other columns used by this
> query as later columns. That would be an even larger index, but it would
> allow an index-only scan, which might be quite a lot faster. The fact
> that you seem to be hitting about one page for each row retrieved says
> that the data you need is pretty badly scattered, so constructing an index
> that concentrates everything you need into one range of the index might
> be the ticket.
>
> Either of these additional-index ideas is going to penalize table
> insertions/updates, so keep an eye on that end of the performance
> question too.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message devel.brain99 2016-07-04 12:19:08 Re: Random slow queries
Previous Message Tom Lane 2016-07-02 01:23:05 Re: less than 2 sec for response - possible?