From: | Jonathan Vanasco <postgres(at)2xlp(dot)com> |
---|---|
To: | Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, PostgreSQL General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: faster way to calculate top "tags" for a "resource" based on a column |
Date: | 2014-10-07 15:42:42 |
Message-ID: | F6D7998D-44FD-495E-8900-CCDE561999F1@2xlp.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Oct 7, 2014, at 10:02 AM, Marc Mamin wrote:
> Hi,
> it seems to me that your subquery may deliver duplicate ids.
> And with the selectivity of your example, I would expect an index usage
> instead of a table scan. You may check how up to date your statistics are
> and try to raise the statistic target on the column resource_2_tag.tag_id.
> Also try a CTE form for your query:
It shouldn't be able to deliver duplicate ids.
=> SELECT COUNT(*) FROM (SELECT DISTINCT id FROM resource WHERE resource_attribute1_id = 614) AS foo;
count ------- 5184
=> SELECT COUNT(*) FROM (SELECT id FROM resource WHERE resource_attribute1_id = 614) AS foo;
count ------- 5184
However, adding in the DISTINCT drastically changed the query plan, and did give a speedup.
Your comment made me focus on the notion of a Table Scan. I assumed it did the seq scan - and there would not be much savings otherwise - because the table is just 2 ids.
I was wrong.
I noticed that I never put a PRIMARY KEY constraint on that table.
So i tried adding a PRIMARY KEY constraint, then running vacuum analyze...
And that solved all my problems.
the original query ended up being the fastest at 260ms ( down from 1760 )
Join - 260ms
Subquery w/DISTINCT - 300ms
CTE - 330
CTE w/DISTINCT - 345ms
Subquery (no DISTINCT) - 1500ms
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-10-07 17:16:38 | Re: Converting char to varchar automatically |
Previous Message | Sergey Konoplev | 2014-10-07 15:29:50 | Re: Converting char to varchar automatically |