From: | Victor Blomqvist <vb(at)viblo(dot)se> |
---|---|
To: | Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com> |
Cc: | PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans |
Date: | 2016-08-19 06:25:50 |
Message-ID: | CAL870DXqOtmGNdnvQucbobQDU85eDv=0Y8q75JZ9GqJPeG_iZQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Aug 19, 2016 at 1:31 PM, Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
wrote:
>
>
> On Fri, 19 Aug 2016, 1:07 p.m. Victor Blomqvist, <vb(at)viblo(dot)se> wrote:
>
>> Hi,
>>
>> Is it possible to break/limit a query so that it returns whatever results
>> found after having checked X amount of rows in a index scan?
>>
>> For example:
>> create table a(id int primary key);
>> insert into a select * from generate_series(1,100000);
>>
>> select * from a
>> where id%2 = 0
>> order by id limit 10
>>
>> In this case the query will "visit" 20 rows and filter out 10 of them. We
>> can see that in the query plan:
>> "Rows Removed by Filter: 10"
>> "Heap Fetches: 20"
>>
>> Is it somehow possible to limit this query so that it only fetches X
>> amount, in my example if we limited it to 10 Heap Fetches the query could
>> return the first 5 rows?
>>
>>
>> My use case is I have a table with 35 million rows with a geo index, and
>> I want to do a KNN search but also limit the query on some other
>> parameters. In some cases the other parameters restrict the query so much
>> that Heap Fetches becomes several 100k or more, and in those cases I would
>> like to have a limit to my query.
>>
>
> Have you checked the TABLESAMPLE clause in v9.5?
>
> https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation
>
>
Unless I misunderstand what you mean or how it works I cant really see what
it would help.
I want my query to still return the "best" results, and I want it to use
the index for that. Just randomly selecting out from the whole table will
either have to sample a too small subset of the rows, or be too slow.
So, given my query above, in the normal ("slow" case) I would find the 10
first even rows:
2,4,6,8,10,12,14,16,18,20
If I could restrict the heap fetches to 10 I would find
2,4,6,8,10
However, with tablesample I might end up with for example these rows:
15024,71914,51682,7110,61802,63390,98278,8022,34256,49220
In my use case I want the best rows (according to the order by), so just a
random sample is not good enough.
/Victor
>
>> Thanks!
>> /Victor
>>
> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
>
> T: +65 6438 3504 | M: +65 8110 0350
>
> Skype: sameer.ashnik | www.ashnik.com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Güttler | 2016-08-19 07:32:28 | PG vs ElasticSearch for Logs |
Previous Message | Sameer Kumar | 2016-08-19 05:31:58 | Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans |