Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans

From: Victor Blomqvist <vb(at)viblo(dot)se>
To: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
Cc: PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org>
Subject: Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans
Date: 2016-08-19 06:25:50
Message-ID: CAL870DXqOtmGNdnvQucbobQDU85eDv=0Y8q75JZ9GqJPeG_iZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Aug 19, 2016 at 1:31 PM, Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
wrote:

>
>
> On Fri, 19 Aug 2016, 1:07 p.m. Victor Blomqvist, <vb(at)viblo(dot)se> wrote:
>
>> Hi,
>>
>> Is it possible to break/limit a query so that it returns whatever results
>> found after having checked X amount of rows in a index scan?
>>
>> For example:
>> create table a(id int primary key);
>> insert into a select * from generate_series(1,100000);
>>
>> select * from a
>> where id%2 = 0
>> order by id limit 10
>>
>> In this case the query will "visit" 20 rows and filter out 10 of them. We
>> can see that in the query plan:
>> "Rows Removed by Filter: 10"
>> "Heap Fetches: 20"
>>
>> Is it somehow possible to limit this query so that it only fetches X
>> amount, in my example if we limited it to 10 Heap Fetches the query could
>> return the first 5 rows?
>>
>>
>> My use case is I have a table with 35 million rows with a geo index, and
>> I want to do a KNN search but also limit the query on some other
>> parameters. In some cases the other parameters restrict the query so much
>> that Heap Fetches becomes several 100k or more, and in those cases I would
>> like to have a limit to my query.
>>
>
> Have you checked the TABLESAMPLE clause in v9.5?
>
> https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation
>
>
Unless I misunderstand what you mean or how it works I cant really see what
it would help.

I want my query to still return the "best" results, and I want it to use
the index for that. Just randomly selecting out from the whole table will
either have to sample a too small subset of the rows, or be too slow.

So, given my query above, in the normal ("slow" case) I would find the 10
first even rows:
2,4,6,8,10,12,14,16,18,20
If I could restrict the heap fetches to 10 I would find
2,4,6,8,10
However, with tablesample I might end up with for example these rows:
15024,71914,51682,7110,61802,63390,98278,8022,34256,49220

In my use case I want the best rows (according to the order by), so just a
random sample is not good enough.

/Victor

>
>> Thanks!
>> /Victor
>>
> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
>
> T: +65 6438 3504 | M: +65 8110 0350
>
> Skype: sameer.ashnik | www.ashnik.com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Güttler 2016-08-19 07:32:28 PG vs ElasticSearch for Logs
Previous Message Sameer Kumar 2016-08-19 05:31:58 Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans