Re: Bitmap table scan cost per page formula

From: "Tels" <nospam-pg-abuse(at)bloodgate(dot)com>
To: "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Haisheng Yuan" <hyuan(at)pivotal(dot)io>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bitmap table scan cost per page formula
Date: 2017-12-21 18:17:45
Message-ID: 9af1aa1ef2ead874ac6e83983eb3f576.squirrel@sm.webmail.pair.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Moin,

On Wed, December 20, 2017 11:51 pm, Jeff Janes wrote:
> On Wed, Dec 20, 2017 at 2:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
>
>> On Wed, Dec 20, 2017 at 4:20 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
>> wrote:
>>>
>>> It is not obvious to me that the parabola is wrong. I've certainly
>>> seen
>>> cases where reading every 2nd or 3rd block (either stochastically, or
>>> modulus) actually does take longer than reading every block, because it
>>> defeats read-ahead. But it depends on a lot on your kernel version and
>>> your kernel settings and your file system and probably other things as
>>> well.
>>>
>>
>> Well, that's an interesting point, too. Maybe we need another graph
>> that
>> also shows the actual runtime of a bitmap scan and a sequential scan.
>>
>
> I've did some low level IO benchmarking, and I actually get 13 times
> slower
> to read every 3rd block than every block using CentOS6.9 with ext4 and the
> setting:
> blockdev --setra 8192 /dev/sdb1
> On some virtualized storage which I don't know the details of, but it
> behaves as if it were RAID/JBOD with around 6 independent spindles..

Repeated this here on my desktop, linux-image-4.10.0-42 with a Samsung SSD
850 EVO 500 Gbyte, on an encrypted / EXT4 partition:

$ dd if=/dev/zero of=zero.dat count=1300000 bs=8192
1300000+0 records in
1300000+0 records out
10649600000 bytes (11 GB, 9,9 GiB) copied, 22,1993 s, 480 MB/s

All blocks:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ time perl -le 'open my $fh, "rand" or die; foreach (1..1300000)
{$x="";next if $_%3>5; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} ' | uniq -c
1299999 8192

real 0m20,841s
user 0m0,960s
sys 0m2,516s

Every 3rd block:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if $_%3>0; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} '|uniq -c
433333 8192

real 0m50,504s
user 0m0,532s
sys 0m2,972s

Every 3rd block random:

$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
$ time perl -le 'open my $fh, "rand" or die; foreach (1..1300000) {$x="";
next if rand()> 0.3333; sysseek $fh,$_*8*1024,0 or die $!; sysread $fh,
$x,8*1024; print length $x} ' | uniq -c
432810 8192

real 0m26,575s
user 0m0,540s
sys 0m2,200s

So it does get slower, but only about 2.5 times respectively about 30%.

Hope this helps,

Tels

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-12-21 18:21:39 Re: ddd
Previous Message Tom Lane 2017-12-21 18:00:12 Re: Letting plpgsql in on the fun with the new expression eval stuff