Re: Slow SELECT

From: Charles Clavadetscher <clavadetscher(at)swisspug(dot)org>
To: Frank Millman <frank(at)chagford(dot)com>
Cc: Olivier Gautherot <ogautherot(at)gautherot(dot)net>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Slow SELECT
Date: 2020-05-26 09:10:45
Message-ID: 40598c3956f865c69d04e8032305fb03@swisspug.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello

On 2020-05-26 10:38, Frank Millman wrote:
> On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
>> Hi Frank,
>>
>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank(at)chagford(dot)com
>> <mailto:frank(at)chagford(dot)com>> wrote:
>>
>> Hi all
>>
>> I have a SELECT that runs over 5 times slower on PostgreSQL
>> compared
>> with Sql Server and sqlite3. I am trying to understand why.
>>
>> I have a table that looks like this (simplified) -
>>
>> CREATE TABLE my_table (
>>      row_id SERIAL PRIMARY KEY,
>>      deleted_id INT DEFAULT 0,
>>      fld_1 INT REFERENCES table_1(row_id),
>>      fld_2 INT REFERENCES table_2(row_id),
>>      fld_3 INT REFERENCES table_3(row_id),
>>      fld_4 INT REFERENCES table_4(row_id),
>>      tran_date DATE,
>>      tran_total DEC(21,2)
>>      );
>>
>> CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
>> fld_4, tran_date) WHERE deleted_id = 0;
>>
>> The table sizes are -
>>      my_table : 167 rows
>>      table_1 : 21 rows
>>      table_2 : 11 rows
>>      table_3 : 3 rows
>>      table_4 : 16 rows
>>
>> Therefore for each tran_date in my_table there are potentially
>> 21x11x3x16 = 11088 rows. Most will be null.
>>
>> I want to select the row_id for the last tran_date for each of
>> those
>> potential groups. This is my select -
>>
>>      SELECT (
>>          SELECT a.row_id FROM my_table a
>>          WHERE a.fld_1 = b.row_id
>>          AND a.fld_2 = c.row_id
>>          AND a.fld_3 = d.row_id
>>          AND a.fld_4 = e.row_id
>>          AND a.deleted_id = 0
>>          ORDER BY a.tran_date DESC LIMIT 1
>>      )
>>      FROM table_1 b, table_2 c, table_3 d, table_4 e
>>
>> Out of 11088 rows selected, 103 are not null.
>>
>> On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
>> sqlite3, and 0.31 sec on PostgreSQL.
>>
>>
>> SQL Server does a good job at caching data in memory. PostgreSQL does
>> too on consecutive calls to the same table. What execution time do you
>> get if you issue the query a second time?
>>
>> My first guess would be to add an index on my_table.tran_date and
>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
>>
>> I have looked at the EXPLAIN, but I don't really know what to look
>> for.
>> I can supply it if that would help.
>>
>> Thanks for any advice.
>>
>
> Thanks Olivier. Unfortunately that did not help.
>
> I was already running the query twice and only timing the second one.
>
> I added the index on tran_date. The timing is the same, and EXPLAIN
> shows that it is using a SEQUENTIAL SCAN.
>
> Here is the EXPLAIN -
>
>
> Nested Loop (cost=0.00..64155.70 rows=11088 width=4)
> -> Nested Loop (cost=0.00..10.36 rows=528 width=12)
> -> Nested Loop (cost=0.00..2.56 rows=33 width=8)
> -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11
> width=4)
> -> Materialize (cost=0.00..1.04 rows=3 width=4)
> -> Seq Scan on table_3 d (cost=0.00..1.03 rows=3
> width=4)
> -> Materialize (cost=0.00..1.24 rows=16 width=4)
> -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16
> width=4)
> -> Materialize (cost=0.00..1.31 rows=21 width=4)
> -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4)
> SubPlan 1
> -> Limit (cost=5.77..5.77 rows=1 width=8)
> -> Sort (cost=5.77..5.77 rows=1 width=8)
> Sort Key: a.tran_date DESC
> -> Seq Scan on my_table a (cost=0.00..5.76 rows=1
> width=8)
> Filter: ((fld_1 = b.row_id) AND (fld_2 =
> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
> (deleted_id = 0))
>
>
> Frank

If I see it correct, the query runs sequential scans on all tables, i.e.
table_1 to table_4.
Do you have an index on the referenced keys (row_id) in table_1 to
table_4?

It happens often that referenced keys are not indexed, leading to poor
execution plans.

Bye
Charles

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Charles Clavadetscher 2020-05-26 09:27:12 Re: Slow SELECT
Previous Message Frank Millman 2020-05-26 08:38:51 Re: Slow SELECT