From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Haisheng Yuan <hyuan(at)pivotal(dot)io>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bitmap table scan cost per page formula |
Date: | 2017-12-21 04:51:04 |
Message-ID: | CAMkU=1xaLs8RnWrY_jLAc_ON--5sT0-QCEbBcQCjR5oxMHc4tg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>> It is not obvious to me that the parabola is wrong. I've certainly seen
>> cases where reading every 2nd or 3rd block (either stochastically, or
>> modulus) actually does take longer than reading every block, because it
>> defeats read-ahead. But it depends on a lot on your kernel version and
>> your kernel settings and your file system and probably other things as well.
>>
>
> Well, that's an interesting point, too. Maybe we need another graph that
> also shows the actual runtime of a bitmap scan and a sequential scan.
>
I've did some low level IO benchmarking, and I actually get 13 times slower
to read every 3rd block than every block using CentOS6.9 with ext4 and the
setting:
blockdev --setra 8192 /dev/sdb1
On some virtualized storage which I don't know the details of, but it
behaves as if it were RAID/JBOD with around 6 independent spindles..
If I pick the 1/3 of the blocks to read stochastically rather than by
modulus, it is only 2 times slower than reading all of them, I assume
because having occasional reads which are adjacent to each other does make
read-ahead kick in, while evenly spaced never-adjacent reads does not.
This is probably a better model of how bitmap table scans actually work, as
there is no general reason to think they would be evenly spaced and
non-adjacent. So this result is in reasonable agreement with how the
current cost estimation works, the parabola peaks at about twice the cost
as the sequence scan.
I used a file of about 10GB, because I happened to have one laying around.
## read every block ($_%3>5 is never true)
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024;
print length $x} '|uniq -c
1295683 8192
4317 0
real 0m38.329s
## read every 3rd block
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh, $x,8*1024;
print length $x} '|uniq -c
431894 8192
1439 0
real 8m54.698s
time perl -wle 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if rand()>0.3333; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} '|uniq -c
431710 8192
1434 0
real 1m23.130s
Dropping the caches is a reasonable way to do this type of benchmark,
because it simulates what would happen if your data set was much larger
than RAM, without needing to actually use a data set much larger than RAM.
It would be interesting to see what other people get for similar low level
tests, as well actual bitmap scans.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2017-12-21 05:42:27 | Re: GSoC 2018 |
Previous Message | Michael Paquier | 2017-12-21 04:32:35 | Re: Add hint about replication slots when nearing wraparound |