Re: Performance trouble finding records through related records

From: Andy Colson <andy(at)squeakycode(dot)net>
To: sverhagen <sverhagen(at)wps-nl(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance trouble finding records through related records
Date: 2011-03-02 03:41:23
Message-ID: 4D6DBC63.7080109@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 03/01/2011 06:14 PM, sverhagen wrote:
> Hi, appreciated mailing list. Thanks already for taking your time for my
> performance question. Regards, Sander.
>
>
> ===POSTGRESQL VERSION AND ORIGIN===
>
> PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4
> (Ubuntu 4.2.4-1ubuntu3)
> Installed using "apt-get install postgresql-8.3"
>
>
> ===A DESCRIPTION OF WHAT YOU ARE TRYING TO ACHIEVE===
>
> Query involving tables events_events and events_eventdetails. There is any
> number of events_eventdetails records for each events_events record.
>
> There may be multiple records in events_events that have the same value for
> their transactionId, which is available in one of their events_eventdetails
> records.
>
> We want a total query that returns events_events records that match
> condition I. or II., sorted by datetime descending, first 50.
>
> Condition I.
> All events_events records for which an events_eventdetails records that
> matches the following conditions:
> - Column keyname (in events_eventdetails) equals "customerId", and
> - Column value (in events_eventdetails) equals 598124, or more precisely
> substring(customerDetails.value,0,32)='598124'
>
> Condition II.
> All events_events records that have a same value for in one of their
> events_eventdetails records with keyname 'transactionId' as any of the
> resulting events_events records of condition I.
>
> In other words: I want all events for a certain customerId, and all events
> with the same transactionId as those.
>
> The total query's time should be of the magnitude 100ms, but currently is of
> the magnitude 1min.
>
> JUST FOR THE PURPOSE OF EXPERIMENT I've now a denormalized copy of
> transactionId as a column in the events_events records. Been trying queries
> on those, with no improvements.
>
> I am not seeking WHY my query is too slow, rather trying to find a way to
> get it faster :-)
>

<much snippage>

First off, excellent detail.

Second, your explain analyze was hard to read... but since you are not really interested in your posted query, I wont worry about looking at it... but... have you seen:

http://explain.depesz.com/

Its nice.

And last, to my questions:

SELECT events1.id, events1.transactionId, events1.dateTime FROM
events_events events1
JOIN events_eventdetails customerDetails
ON events1.id = customerDetails.event_id
AND customerDetails.keyname='customer_id'
AND substring(customerDetails.value,0,32)='598124'
WHERE events1.eventtype_id IN
(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)
UNION
SELECT events2.id, events2.transactionId, events2.dateTime FROM
events_events events2
JOIN events_eventdetails details2_transKey
ON events2.id = details2_transKey.event_id
AND details2_transKey.keyname='transactionId'
JOIN events_eventdetails details2_transValue
ON substring(details2_transKey.value,0,32) =
substring(details2_transValue.value,0,32)
AND details2_transValue.keyname='transactionId'
JOIN events_eventdetails customerDetails
ON details2_transValue.event_id = customerDetails.event_id
AND customerDetails.keyname='customer_id'
AND substring(customerDetails.value,0,32)='598124'
WHERE events2.eventtype_id IN
(100,103,105,106,45,34,14,87,58,78,7,76,11,25,57,98,30,35,33,49,52,28,85,59,23,22,51,48,36,65,66,18,13,86,75,44,38,43,94,56,95,96,71,50,81,90,89,16,17,88,79,77,68,97,92,67,72,53,2,10,31,32,80,24,93,26,9,8,61,5,73,70,63,20,60,40,41,39,101,104,107,99,64,62,55,69,19,46,47,15,21,27,54,12,102,108)
ORDER BY dateTime DESC LIMIT 50

If you run the individual queries, without the union, are the part's slow too?

Looked like your row counts (the estimate vs the actual) were way off, have you analyzed lately?

I could not tell from the explain analyze if an index was used, but I notice you have a ton of indexes on events_events table. You have two indexes on the same fields but in reverse order:

events_events_eventtype_id_datetime_ind (datetime, eventtype_id);
events_events_datetime_eventtype_id_ind (eventtype_id, datetime);

AND both eventtype_id and datetime are in other indexes! I think you need to review your indexes. Drop all of them and add one or two that are actually useful.

A useful tool I have found for complex queries is to break them down into smaller sub sets, write sql that get's me just those sets, and them add them all back into one main query with subselects:

select a,b,c,...
from events_events
where
id in ( select id from details where some subset is needed )
and id not in ( select id frome details where some set is bad )
and id in ( select anotherid from anothertable where ... )

Its the subselects you need to think about. Find one that gets you a small set that's interesting somehow. Once you get all your little sets, its easy to combine them.

-Andy

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2011-03-02 14:08:45 Re: Pushing IN (subquery) down through UNION ALL?
Previous Message sverhagen 2011-03-02 00:14:15 Performance trouble finding records through related records