Quick Links

Re: index choosing problem

From:	Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To:	Rural Hunter <ruralhunter(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: index choosing problem
Date:	2012-03-15 11:44:31
Message-ID:	CA+CSw_uYDPF+CDfk=pmmKM81xWp9gGw8y=QEYFNMR63KgnaJBg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

2012/3/15 Rural Hunter <ruralhunter(at)gmail(dot)com>:
> Now the query 3-5 using article_others_pkey are quite slow. The rows for cid
> 74 are very old and seldom get updated. I think pg needs to scan quite a lot
> on article_others_pkey before it gets the rows for cid 74. The same query
> for other cids with new and majority of rows runs very fast.

This is because the PostgreSQL cost model doesn't know about the
correlation between aid and cid. In absence of information it assumes
that it will find a row with cid=74 about every 68 rows
(889520/13047).

One option to fix this case is to use OFFSET 0 as an optimization barrier:
SELECT max(aid) FROM
(SELECT aid FROM article_others WHERE cid=74 OFFSET 0) AS x;

That has the unfortunate effect of performing badly for cid's that are
extremely popular. That may or may not be acceptable in your case.

To fix this properly the query optimizer needs to know the relationship between
aid and cid and needs to know how to apply that to estimating the cost
of index scans. A prerequisite for implementing this is to have
multi-column statistics. To do the estimation, the current linear cost
model needs to be changed to something that can express a non-linear
relationship between tuples returned and cost, e.g. a piece-wise
linear model. The stats collection part is actually feasible, in fact
I'm currently working on a patch for that. As for the estimation
improvement, I have an idea how it might work, but I'm not really sure
yet if the performance hit for query planning would be acceptable.

> Another question, why the plan shows rows=13047 for cid=74 while actually it
> only has 4199 rows? There is almost no data changes for cid 74 and I just
> vacuum/analyzed the table this morning.

Might just be an artifact of random sampling. Try raising your stats
target and re-analyzing to confirm.

All the best,
Ants Aasma

In response to

index choosing problem at 2012-03-15 08:32:09 from Rural Hunter

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andy Colson	2012-03-16 13:55:39	Re: Shared memory for large PostGIS operations
Previous Message	Rural Hunter	2012-03-15 08:32:09	index choosing problem