From: | Lee Hachadoorian <lee(dot)hachadoorian(at)gmail(dot)com> |
---|---|
To: | Anders Østergaard Jensen <aj(at)itersys(dot)dk> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: Question on COUNT performance |
Date: | 2010-07-14 16:21:26 |
Message-ID: | AANLkTikWZdVzUQlOLRnPvqI0aJ9pNdUtUfeSLiXGIjfg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
The first statement of the function
> : select into user *
> from users where id = uid;
appears to be a useless drag, as I don't see the user record referred
to anywhere else in the function. There appears to be other
unnecessary statements. For example :
> : select into pcount_rel
> COUNT(*) as acl_count FROM project_plan_events ppe WHERE ppe.plan_event_id =
> plan_id;
>
> : if
> (pcount_rel.acl_count > 0) then
>
>
>
> : SELECT INTO
> project * FROM projects WHERE id IN (SELECT project_id FROM
> project_plan_events ppe2 WHERE ppe2.plan_event_id = plan_id) OFFSET 0 LIMIT
> 1;
>
> : return
> f_project_acl(uid, project.id);
>
>
> : end if;
>
This appears to check whether the plan_id exists in a link table, find
an associated project_id, and run some function on project_id.
This could instead be done as:
FOR project_record IN SELECT project_id FROM project_plan_events WHERE
plan_event_id = plan_id LIMIT 1 LOOP
return f_project_acl(uid, project_record.project_id)
END LOOP;
If plan_id doesn't exist in project_plan_events, the LOOP just gets skipped.
The same could be done for the next IF block in they query which
checks to see whether plan_id has a matching customer_id in a link
table.
Note that your LIMIT 1 (which I have retained) strongly implies a
1-to-1 relationship between project_id and plan_id. If not, this
function gets applied to an arbitrary project_id from among all
matching project_ids. (Same goes for customer_id.)
Assuming f_project_acl and f_customer_acl return TRUE if successful,
the whole thing (from the original SELECT COUNT(*) looks like it can
be summarized as:
Call a function with a plan_id
If a matching project_id exists
Do some function on the project_id
count +1
Else If a matching customer_id exists
Do some function on the customer_id
count +1
Else
count +1
Return count, which, since the function gets called once for each row
in plan_events, count should always equal the number of rows in plan
events.
I would be inclined to replace the whole thing with something like this:
SELECT newfunc(uid);
CREATE FUNCTION newfunc(uid int) RETURNS int AS $$
DECLARE
plan_record record;
i int := 0;
BEGIN
FOR plan_record IN SELECT DISTINCT plan_id, project_id, customer_id
FROM plan_events LEFT JOIN project_plan_events USING (plan_id) LEFT
JOIN customer_plan_events USING (plan_id) LOOP
IF plan_record.project_id IS NOT NULL THEN
PERFORM f_project_acl(uid, plan_record.project_id);
ELSEIF plan_record.customer_id IS NOT NULL THEN
PERFORM f_customer_acl(uid, plan_record.customer_id);
END IF;
i := i + 1;
END LOOP ;
RETURN i;
END;
$$ LANGUAGE plpgsql;
If I understand what's going on in your function, I *think* this would
reduce 9000-12,000 SELECT statements to 1 SELECT statement.
Obviously, not tested. Hope this is helpful.
--Lee
--
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center
From | Date | Subject | |
---|---|---|---|
Next Message | Lee Hachadoorian | 2010-07-14 16:49:14 | Re: Question on COUNT performance |
Previous Message | Thomas Kellerer | 2010-07-14 16:06:22 | Re: subtract two dates to get the number of days |