Re: seq vs index scan in join query

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Emanuel Alvarez <ema(at)abductedcow(dot)com(dot)ar>, pgsql-general(at)postgresql(dot)org
Subject: Re: seq vs index scan in join query
Date: 2017-11-29 17:17:18
Message-ID: 1511975838.2309.12.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Emanuel Alvarez wrote:
> the problematic query looks like this:
>
> SELECT keywords.strategy_id, results.position, results.created_at FROM results
> JOIN keywords ON results.keyword_id = keywords.id
> WHERE results.account_id = 1
> AND results.created_at >= '2017-10-25 00:00:00.000000'
> AND results.created_at <= '2017-11-10 23:59:59.999999';
>
>
> as you can see in the query plan [1] a sequential scan is preferred.
> as we understand it, this happens because the number of rows returned
> from results is too large. if we reduce this number by either
> selecting a smaller created_at range, or another account_id with fewer
> keywords, the planner falls back to an index scan, confirming that the
> number of rows returned from results has a direct influence in this
> choice.
>
> on the other hand, if we disable sequential scans (SET enable_seqscan
> = 0), we see than not only the query runs faster but the cost seems to
> be lower, as seen in the query plan [2].

The optimizer is right here.

Even though your second execution without sequential scans ran faster,
it is worse.

That is because the execution with the sequential scan touched
26492 + 80492 = 106984 blocks, while the second execution touched
311301 + 48510 = 359811 blocks, more than three times as many.

The second execution was just lucky because most of these blocks were
already cached, and it had to read only half as many blocks from disk.

If you repeat the execution a couple of times, you should see that
the execution using the sequential scans becomes faster.

You can boost performance even more by increasing work_mem
so that the hash can be created in memory.

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2017-11-29 17:46:40 Re: seq vs index scan in join query
Previous Message Steven Lembark 2017-11-29 17:03:17 Re: large numbers of inserts out of memory strategy