Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans

From: Victor Blomqvist <vb(at)viblo(dot)se>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans
Date: 2016-08-19 18:01:57
Message-ID: CAL870DVEaNJkFv7bk5EnsZ3vDdP2DQe8iLgLfqJpJi_rk-0yHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Aug 20, 2016 at 1:13 AM, Francisco Olarte <folarte(at)peoplecall(dot)com>
wrote:

> Hi Victor:
>
>
> On Fri, Aug 19, 2016 at 7:02 PM, Victor Blomqvist <vb(at)viblo(dot)se> wrote:
> > What I want to avoid is my query visiting the whole 1m rows to get a
> result,
> > because in my real table that can take 100sec. At the same time I want
> the
> > queries that only need to visit 1k rows finish quickly, and the queries
> that
> > visit 100k rows at least get some result back.
>
> You are going to have problems with that. If you just want to limit it
> to max 100k rows, max 10 results my solution works, probably better as
> nested selects than CTEs, but someone more knowledgeable in the
> optimizer will need to say something ( or tests will be needed ). This
> is because "the queries that visit 100k rows at least get some result
> back." may be false, you may need to visit the whole 1M to get the
> first result if you are unlucky. Just set ap=999 where id=1M and ask
> for ap>=999 and you've got that degenerate case, which can only be
> saved if you have an index on ap ( even with statistics, you would
> need a full table scan to find it ).
>
> If you are positive some results are in the first 100k rows, then my
> method works fine, how fast will need to be tested with the real data.
> You can even try using *10, *100, *1k of the real limit until you have
> enough results if you want to time-limit your queries.
>
>
> Francisco Olarte.
>

Thanks! A sub select seems to do it.

I didnt think of it before, guess I got blinded by the CTE since usually
its the other way around and the CTE is the answer to the problem.
But seems like the easy solution with a good old sub select fixes it.
Now I feel a bit embarrassed for such a easy answer :)

Checking these two queries I can see that the first one visits the
max 50 rows its allowed to and returns 5 rows, while the second one
finish off after 13 rows fetched and returns the full 10 rows.

select *
from (select * from b order by id limit 50) x
where age_preference%10 < 1
order by id limit 10

select *
from (select * from b order by id limit 50) x
where age_preference%10 < 5
order by id limit 10

/Victor

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Mair 2016-08-19 20:40:04 Re: PG vs ElasticSearch for Logs
Previous Message Andy Colson 2016-08-19 17:59:47 Re: PG vs ElasticSearch for Logs