From: | Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com> |
---|---|
To: | Victor Blomqvist <vb(at)viblo(dot)se> |
Cc: | PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans |
Date: | 2016-08-19 09:16:42 |
Message-ID: | CADp-Sm7GSO_=7iOwarX9-ScwA9GOoeiCMZxcztyY7_qqvUphWw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Aug 19, 2016 at 2:25 PM Victor Blomqvist <vb(at)viblo(dot)se> wrote:
> On Fri, Aug 19, 2016 at 1:31 PM, Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
> wrote:
>
>>
>>
>> On Fri, 19 Aug 2016, 1:07 p.m. Victor Blomqvist, <vb(at)viblo(dot)se> wrote:
>>
>>> Hi,
>>>
>>> Is it possible to break/limit a query so that it returns whatever
>>> results found after having checked X amount of rows in a index scan?
>>>
>>> For example:
>>> create table a(id int primary key);
>>> insert into a select * from generate_series(1,100000);
>>>
>>> select * from a
>>> where id%2 = 0
>>> order by id limit 10
>>>
>>> In this case the query will "visit" 20 rows and filter out 10 of them.
>>> We can see that in the query plan:
>>> "Rows Removed by Filter: 10"
>>> "Heap Fetches: 20"
>>>
>>> Is it somehow possible to limit this query so that it only fetches X
>>> amount, in my example if we limited it to 10 Heap Fetches the query could
>>> return the first 5 rows?
>>>
>>>
>
>>> My use case is I have a table with 35 million rows with a geo index, and
>>> I want to do a KNN search but also limit the query on some other
>>> parameters. In some cases the other parameters restrict the query so much
>>> that Heap Fetches becomes several 100k or more, and in those cases I would
>>> like to have a limit to my query.
>>>
>>
>> Have you checked the TABLESAMPLE clause in v9.5?
>>
>> https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation
>>
>>
> Unless I misunderstand what you mean or how it works I cant really see
> what it would help.
>
>
I stand corrected. TABLESAMPLE will not help you.
> I want my query to still return the "best" results, and I want it to use
> the index for that. Just randomly selecting out from the whole table will
> either have to sample a too small subset of the rows, or be too slow.
>
> So, given my query above, in the normal ("slow" case) I would find the 10
> first even rows:
> 2,4,6,8,10,12,14,16,18,20
> If I could restrict the heap fetches to 10 I would find
> 2,4,6,8,10
> However, with tablesample I might end up with for example these rows:
> 15024,71914,51682,7110,61802,63390,98278,8022,34256,49220
>
>
How about using the LIMIT ?
SELECT column_1, column_2, ... FROM my_table WHERE <<expression>>
ORDER BY my_column
LIMIT 10 ;
> In my use case I want the best rows (according to the order by), so just
> a random sample is not good enough.
>
> /Victor
>
>
>>
>>> Thanks!
>>> /Victor
>>>
>> --
>> --
>> Best Regards
>> Sameer Kumar | DB Solution Architect
>> *ASHNIK PTE. LTD.*
>>
>> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
>>
>> T: +65 6438 3504 | M: +65 8110 0350
>>
>> Skype: sameer.ashnik | www.ashnik.com
>>
> --
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
T: +65 6438 3504 | M: +65 8110 0350
Skype: sameer.ashnik | www.ashnik.com
From | Date | Subject | |
---|---|---|---|
Next Message | Sameer Kumar | 2016-08-19 09:21:43 | Re: PG vs ElasticSearch for Logs |
Previous Message | Thomas Güttler | 2016-08-19 08:57:51 | Re: PG vs ElasticSearch for Logs |