From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com> |
Cc: | Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: proposal : cross-column stats |
Date: | 2010-12-24 12:37:29 |
Message-ID: | 1A6ED2FD-7126-4C08-A602-9152C74E7011@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Dec24, 2010, at 11:23 , Nicolas Barbier wrote:
> 2010/12/24 Florian Pflug <fgp(at)phlo(dot)org>:
>
>> On Dec23, 2010, at 20:39 , Tomas Vondra wrote:
>>
>>> I guess we could use the highest possible value (equal to the number
>>> of tuples) - according to wiki you need about 10 bits per element
>>> with 1% error, i.e. about 10MB of memory for each million of
>>> elements.
>>
>> Drat. I had expected these number to come out quite a bit lower than
>> that, at least for a higher error target. But even with 10% false
>> positive rate, it's still 4.5MB per 1e6 elements. Still too much to
>> assume the filter will always fit into memory, I fear :-(
>
> I have the impression that both of you are forgetting that there are 8
> bits in a byte. 10 bits per element = 1.25MB per milion elements.
Uh, of course. So in the real universe, the numbers are
~1.2MB per 1e6 elements for a false positive rate of 1%
~0.5MB per 1e6 elements for a false positive rate of 10%
Hm. So for a table with a billion distinct elements, we'd need half
a gigabyte per column for the filter. A tuple with two int columns
takes at least 24+2*4 = 32bytes to store I think, making such a table
at least 32GB in size. The filter size would thus be 1/64 of the table
size in the worst case.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Shigeru HANADA | 2010-12-24 12:57:09 | Re: SQL/MED - core functionality |
Previous Message | tv | 2010-12-24 12:15:00 | Re: proposal : cross-column stats |