Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Date: 2009-10-27 05:44:35
Message-ID: 4AE688C3.2080207@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig Ringer wrote:
> On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote:
>
>>> You should probably re-generate your random value for each call rather
>>> than store it. Currently, every document with commonterm20 is guaranteed
>>> to also have commonterm40, commonterm60, etc, which probably isn't very
>>> realistic, and also makes doc size correlate with word rarity.
>> I had that in the first version, but I wanted to have the gaurantee that
>> a commonterm60 was indeed a subset of commonterm80, so that why its
>> sturctured like that. I know its not realistic, but it gives measureable
>> results since I know my queries will hit the same tuples.
>>
>> I fail to see how this should have any direct effect on query time?
>
> Probably not, in truth, but with the statistics-based planner I'm
> occasionally surprised by what can happen.
>
>>> In this sort of test it's often a good idea to TRUNCATE the table before
>>> populating it with a newly generated data set. That helps avoid any
>>> residual effects from table bloat etc from lingering between test runs.
>> As you could see in the scripts, the table is dropped just before its
>> recreated and filled with data.
>>
>> Did you try to re-run the test?
>
> No, I didn't. I thought it worth checking if bloat might be the result
> first, though I should've read the scripts to confirm you weren't
> already handling that possibility.
>
> Anyway, I've done a run to generate your data set and run a test. After
> executing the test statement twice (once with and once without
> enable_seqscan) to make sure all data is in cache and not being read
> from disk, when I run the tests here are my results:
>
>
> test=> set enable_seqscan=on;
> SET
> test=> explain analyze select id from ftstest where body_fts @@
> to_tsquery('commonterm80');

Here you should search for "commonterm" not "commonterm80", commonterm
will go into a seq-scan. You're not testing the same thing as I did.

> Any chance your disk cache was cold on the first test run, so Pg was
> having to read the table from disk during the seqscan, and could just
> use shared_buffers when you repeated the test for the index scan?

they were run repeatedly.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2009-10-27 06:14:37 Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).
Previous Message Craig Ringer 2009-10-27 05:33:54 Re: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).