From: | "Simon Riggs" <simon(at)2ndquadrant(dot)com> |
---|---|
To: | <pgsql(at)mohawksoft(dot)com>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com> |
Cc: | <pgsql-hackers(at)postgresql(dot)org>, <rm_pg(at)cheapcomplexdevices(dot)com> |
Subject: | Re: correlation in pg_stats |
Date: | 2005-02-08 20:32:24 |
Message-ID: | KGEFLMPJFBNNLNOOOPLGAEJLCIAA.simon@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>M(at)a@r(at)k wrote
> > Short summary:
> >
> > * It looks to me like the planner vastly overestimates
> > the # of pages read by index scan in quite a few of my
> > tables even though stats collected by ANALYZE are correct.
> >
> > * The problem happens any time you have multiple columns
> > that have a number of repeated values in them, and
> > you CLUSTER the table by a sort using both columns
> > (like "city,state,zip,phone#" or "firstname,lastname").
> >
> > * I think this is the problem that Mark Kirkwood is seeing
> > in his threads Query optimizer 8.0.1 and "One Big trend
> > vs multiple smaller trends" in hackers.
>
> actually pgsql(at)mohawksoft(dot)com, is Mark Woodward. Pleased to meet you.
> :)
>
> (I hate using my name on lists like this because of spammers)
>
> >
> > * A test script demonstrating the issue also follows.
> >
> > * I think keeping one more stat per attribute in
> > pg_stastic that could describe this behavior.
> >
> >
> > Longer:
> >
> >
> > If I understand the optimizer correctly, correlation is used
> > to both guess how much random disk access will be required in
> > a query; as well as estimate how many pages will be read.
> >
> > Unfortunately, many tables in my larger databases have
> > columns with values that are tightly packed on a few pages;
> > even though there is no total-ordering across the whole table.
> > Stephan Szabo described this as a "clumping effect":
> >
> http://archives.postgresql.org/pgsql-performance/2003-01/msg00286.php
>
> Yes.
>
> I think we are describing the exact same issue.
>
I believe that is a very common case.
Many natural value distributions follow the Zipf distribution, which is
commonly known as the 80-20 curve.
Best Regards, SImon Riggs
From | Date | Subject | |
---|---|---|---|
Next Message | Marc G. Fournier | 2005-02-08 20:48:00 | Re: "external indices" ... |
Previous Message | Oleg Bartunov | 2005-02-08 20:21:59 | Re: "external indices" ... |