| From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> | 
|---|---|
| To: | Joe Van Dyk <joe(at)tanga(dot)com> | 
| Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> | 
| Subject: | Re: Query performance | 
| Date: | 2015-01-25 06:12:11 | 
| Message-ID: | CAFj8pRAg_VRwRaOBRwVMNtg8KQNs0W82i3EkCCj8_kNBtWtQ-w@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
Hi
this plan looks well
Regards
Pavel
2015-01-25 6:45 GMT+01:00 Joe Van Dyk <joe(at)tanga(dot)com>:
> Oops, didn't run vacuum analyze after deleting the events. Here is another
> 'explain analyze': http://explain.depesz.com/s/AviN
>
> On Sat, Jan 24, 2015 at 9:43 PM, Joe Van Dyk <joe(at)tanga(dot)com> wrote:
>
>> On Sat, Jan 24, 2015 at 9:41 PM, Joe Van Dyk <joe(at)tanga(dot)com> wrote:
>>
>>> I have an events table that records page views and purchases (type =
>>> 'viewed' or type='purchased'). I have a query that figures out "people who
>>> bought/viewed this also bought/viewed that".
>>>
>>> It worked fine, taking about 0.1 seconds to complete, until a few hours
>>> ago when it started taking hours to complete. Vacuum/analyze didn't help.
>>> Turned out there was one session_id that had 400k rows in the system.
>>> Deleting that made the query performant again.
>>>
>>> Is there anything I can do to make the query work better in cases like
>>> that? Missing index, or better query?
>>>
>>> This is on 9.3.5.
>>>
>>> The below is reproduced at the following URL if it's not formatted
>>> correctly in the email.
>>> https://gist.githubusercontent.com/joevandyk/cb8f4afdb6c1b178c606/raw/9940bbe033ebd56d38caa46e33c1ddfd9df36eda/gistfile1.txt
>>>
>>> explain  select
>>>    e1.product_id,
>>>    e2.site_id,
>>>    e2.product_id,
>>>    count(nullif(e2.type='viewed', false)) view_count,
>>>    count(nullif(e2.type='purchased', false)) purchase_count
>>>  from events e1
>>>  join events e2 on e1.session_id = e2.session_id and e1.type = e2.type
>>>  where
>>>    e1.product_id = '82503' and
>>>    e1.product_id != e2.product_id
>>>  group by e1.product_id, e2.product_id, e2.site_id;
>>>                                                          QUERY PLAN
>>> ----------------------------------------------------------------------------------------------------------------------------
>>>  GroupAggregate  (cost=828395.67..945838.90 rows=22110 width=19)
>>>    ->  Sort  (cost=828395.67..840117.89 rows=4688885 width=19)
>>>          Sort Key: e1.product_id, e2.product_id, e2.site_id
>>>          ->  Nested Loop  (cost=11.85..20371.14 rows=4688885 width=19)
>>>                ->  Bitmap Heap Scan on events e1  (cost=11.29..1404.31 rows=369 width=49)
>>>                      Recheck Cond: (product_id = '82503'::citext)
>>>                      ->  Bitmap Index Scan on events_product_id_site_id_idx  (cost=0.00..11.20 rows=369 width=0)
>>>                            Index Cond: (product_id = '82503'::citext)
>>>                ->  Index Scan using events_session_id_type_product_id_idx on events e2  (cost=0.56..51.28 rows=12 width=51)
>>>                      Index Cond: ((session_id = e1.session_id) AND (type = e1.type))
>>>                      Filter: (e1.product_id <> product_id)
>>> (11 rows)
>>>
>>> recommender_production=> \d events
>>>                                     Table "public.events"
>>>    Column    |           Type           |                      Modifiers
>>> -------------+--------------------------+-----------------------------------------------------
>>>  id          | bigint                   | not null default nextval('events_id_seq'::regclass)
>>>  user_id     | citext                   |
>>>  session_id  | citext                   | not null
>>>  product_id  | citext                   | not null
>>>  site_id     | citext                   | not null
>>>  type        | text                     | not null
>>>  happened_at | timestamp with time zone | not null
>>>  created_at  | timestamp with time zone | not null
>>> Indexes:
>>>     "events_pkey" PRIMARY KEY, btree (id)
>>>     "events_product_id_site_id_idx" btree (product_id, site_id)
>>>     "events_session_id_type_product_id_idx" btree (session_id, type, product_id)
>>> Check constraints:
>>>     "events_session_id_check" CHECK (length(session_id::text) < 255)
>>>     "events_type_check" CHECK (type = ANY (ARRAY['purchased'::text, 'viewed'::text]))
>>>     "events_user_id_check" CHECK (length(user_id::text) < 255)
>>>
>>>
>>>
>>>
>> After removing the session with 400k events, I was able to do an explain
>> analyze, here is one of them:
>> http://explain.depesz.com/s/PFNk
>>
>
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Joe Van Dyk | 2015-01-25 06:38:04 | Re: Query performance | 
| Previous Message | Joe Van Dyk | 2015-01-25 05:45:50 | Re: Query performance |