From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ALTER TABLE ... ALTER COLUMN ... SET DISTINCT |
Date: | 2009-04-06 02:38:04 |
Message-ID: | 603c8f070904051938q666d97e4r646a2a2aac2f5c89@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Apr 5, 2009 at 10:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Apr 5, 2009 at 7:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> [ shrug... ] Precision is not important for this value: we are not
>>> anywhere near needing more than six significant digits for our
>>> statistical estimates. Range, on the other hand, could be important
>>> when dealing with really large tables.
>
>> I thought about that, and if you think that's better, I can implement
>> it that way. Personally, I'm unconvinced. The use case for
>> specifying a number of distinct values in excess of 2 billion as an
>> absolute number rather than as a percentage of the table size seems
>> pretty weak to me.
>
> I was more concerned about the other end of it. Your patch sets a
> not-too-generous lower bound on the percentage that can be represented ...
Huh? With a scaling factor of 1 million, you can represent anything
down to about 0.000001, which is apparently all you can expect out of
a float4 anyway.
http://archives.postgresql.org/pgsql-bugs/2009-01/msg00039.php
In fact, we could change the scaling factor to 1 billion if you like,
and it would then give you MORE significant digits than you'll get out
of a float4 (and you'll be able to predict the exact number that
you're gonna get). If someone has billions of rows in the table but
only thousands of distinct values, I would expect them to run a script
to count 'em up and specify the exact number, rather than specifying
some microscopic percentage. But there's certainly enough range in
int4 to tack on three more decimal places if you think it's warranted.
(It's also worth pointing out that the calculations we do with
ndistinct are pretty approximations anyway. If the difference between
stadistinct = -1 x 10^-6 and stadistinct = -1.4^10-6 is the thing
that's determining whether the planner is picking the correct plan on
your 4-billion-row table, you probably want to tune some other
parameter as well so as to get further away from that line. Just
getting the value in the ballpark should be a big improvement over how
things stand now.)
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2009-04-06 02:40:21 | Re: ALTER TABLE ... ALTER COLUMN ... SET DISTINCT |
Previous Message | Greg Sabino Mullane | 2009-04-06 02:21:59 | Re: How would I get rid of trailing blank line? |