Quick Links

Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans

From:	Victor Blomqvist <vb(at)viblo(dot)se>
To:	Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
Cc:	PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans
Date:	2016-08-19 06:25:50
Message-ID:	CAL870DXqOtmGNdnvQucbobQDU85eDv=0Y8q75JZ9GqJPeG_iZQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Fri, Aug 19, 2016 at 1:31 PM, Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
wrote:

>
>
> On Fri, 19 Aug 2016, 1:07 p.m. Victor Blomqvist, <vb(at)viblo(dot)se> wrote:
>
>> Hi,
>>
>> Is it possible to break/limit a query so that it returns whatever results
>> found after having checked X amount of rows in a index scan?
>>
>> For example:
>> create table a(id int primary key);
>> insert into a select * from generate_series(1,100000);
>>
>> select * from a
>> where id%2 = 0
>> order by id limit 10
>>
>> In this case the query will "visit" 20 rows and filter out 10 of them. We
>> can see that in the query plan:
>> "Rows Removed by Filter: 10"
>> "Heap Fetches: 20"
>>
>> Is it somehow possible to limit this query so that it only fetches X
>> amount, in my example if we limited it to 10 Heap Fetches the query could
>> return the first 5 rows?
>>
>>
>> My use case is I have a table with 35 million rows with a geo index, and
>> I want to do a KNN search but also limit the query on some other
>> parameters. In some cases the other parameters restrict the query so much
>> that Heap Fetches becomes several 100k or more, and in those cases I would
>> like to have a limit to my query.
>>
>
> Have you checked the TABLESAMPLE clause in v9.5?
>
> https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation
>
>
Unless I misunderstand what you mean or how it works I cant really see what
it would help.

I want my query to still return the "best" results, and I want it to use
the index for that. Just randomly selecting out from the whole table will
either have to sample a too small subset of the rows, or be too slow.

So, given my query above, in the normal ("slow" case) I would find the 10
first even rows:
2,4,6,8,10,12,14,16,18,20
If I could restrict the heap fetches to 10 I would find
2,4,6,8,10
However, with tablesample I might end up with for example these rows:
15024,71914,51682,7110,61802,63390,98278,8022,34256,49220

In my use case I want the best rows (according to the order by), so just a
random sample is not good enough.

/Victor

>
>> Thanks!
>> /Victor
>>
> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
>
> T: +65 6438 3504 | M: +65 8110 0350
>
> Skype: sameer.ashnik | www.ashnik.com
>

In response to

Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans at 2016-08-19 05:31:58 from Sameer Kumar

Responses

Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans at 2016-08-19 09:16:42 from Sameer Kumar

Browse pgsql-general by date

	From	Date	Subject
Next Message	Thomas Güttler	2016-08-19 07:32:28	PG vs ElasticSearch for Logs
Previous Message	Sameer Kumar	2016-08-19 05:31:58	Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans