Re: Slow SELECT

From: Charles Clavadetscher <clavadetscher(at)swisspug(dot)org>
To: Frank Millman <frank(at)chagford(dot)com>
Cc: Olivier Gautherot <ogautherot(at)gautherot(dot)net>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Slow SELECT
Date: 2020-05-26 09:27:12
Message-ID: 43655b03c50ff22bb076b5946c7cec76@swisspug.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2020-05-26 11:10, Charles Clavadetscher wrote:
> Hello
>
> On 2020-05-26 10:38, Frank Millman wrote:
>> On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
>>> Hi Frank,
>>>
>>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank(at)chagford(dot)com
>>> <mailto:frank(at)chagford(dot)com>> wrote:
>>>
>>> Hi all
>>>
>>> I have a SELECT that runs over 5 times slower on PostgreSQL
>>> compared
>>> with Sql Server and sqlite3. I am trying to understand why.
>>>
>>> I have a table that looks like this (simplified) -
>>>
>>> CREATE TABLE my_table (
>>>      row_id SERIAL PRIMARY KEY,
>>>      deleted_id INT DEFAULT 0,
>>>      fld_1 INT REFERENCES table_1(row_id),
>>>      fld_2 INT REFERENCES table_2(row_id),
>>>      fld_3 INT REFERENCES table_3(row_id),
>>>      fld_4 INT REFERENCES table_4(row_id),
>>>      tran_date DATE,
>>>      tran_total DEC(21,2)
>>>      );
>>>
>>> CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2,
>>> fld_3,
>>> fld_4, tran_date) WHERE deleted_id = 0;
>>>
>>> The table sizes are -
>>>      my_table : 167 rows
>>>      table_1 : 21 rows
>>>      table_2 : 11 rows
>>>      table_3 : 3 rows
>>>      table_4 : 16 rows
>>>
>>> Therefore for each tran_date in my_table there are potentially
>>> 21x11x3x16 = 11088 rows. Most will be null.
>>>
>>> I want to select the row_id for the last tran_date for each of
>>> those
>>> potential groups. This is my select -
>>>
>>>      SELECT (
>>>          SELECT a.row_id FROM my_table a
>>>          WHERE a.fld_1 = b.row_id
>>>          AND a.fld_2 = c.row_id
>>>          AND a.fld_3 = d.row_id
>>>          AND a.fld_4 = e.row_id
>>>          AND a.deleted_id = 0
>>>          ORDER BY a.tran_date DESC LIMIT 1
>>>      )
>>>      FROM table_1 b, table_2 c, table_3 d, table_4 e
>>>
>>> Out of 11088 rows selected, 103 are not null.
>>>
>>> On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
>>> sqlite3, and 0.31 sec on PostgreSQL.
>>>
>>>
>>> SQL Server does a good job at caching data in memory. PostgreSQL does
>>> too on consecutive calls to the same table. What execution time do
>>> you get if you issue the query a second time?
>>>
>>> My first guess would be to add an index on my_table.tran_date and
>>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
>>>
>>> I have looked at the EXPLAIN, but I don't really know what to
>>> look for.
>>> I can supply it if that would help.
>>>
>>> Thanks for any advice.
>>>
>>
>> Thanks Olivier. Unfortunately that did not help.
>>
>> I was already running the query twice and only timing the second one.
>>
>> I added the index on tran_date. The timing is the same, and EXPLAIN
>> shows that it is using a SEQUENTIAL SCAN.
>>
>> Here is the EXPLAIN -
>>
>>
>> Nested Loop (cost=0.00..64155.70 rows=11088 width=4)
>> -> Nested Loop (cost=0.00..10.36 rows=528 width=12)
>> -> Nested Loop (cost=0.00..2.56 rows=33 width=8)
>> -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11
>> width=4)
>> -> Materialize (cost=0.00..1.04 rows=3 width=4)
>> -> Seq Scan on table_3 d (cost=0.00..1.03
>> rows=3 width=4)
>> -> Materialize (cost=0.00..1.24 rows=16 width=4)
>> -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16
>> width=4)
>> -> Materialize (cost=0.00..1.31 rows=21 width=4)
>> -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4)
>> SubPlan 1
>> -> Limit (cost=5.77..5.77 rows=1 width=8)
>> -> Sort (cost=5.77..5.77 rows=1 width=8)
>> Sort Key: a.tran_date DESC
>> -> Seq Scan on my_table a (cost=0.00..5.76 rows=1
>> width=8)
>> Filter: ((fld_1 = b.row_id) AND (fld_2 =
>> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
>> (deleted_id = 0))
>>
>>
>> Frank
>
> If I see it correct, the query runs sequential scans on all tables,
> i.e. table_1 to table_4.
> Do you have an index on the referenced keys (row_id) in table_1 to
> table_4?
>
> It happens often that referenced keys are not indexed, leading to poor
> execution plans.
>
> Bye
> Charles

I noticed later that you have very small tables. This will probably lead
to a sequential scan althought there is an index in place.

I am not sure if it makes a difference, but what about using explicit
joins?

SELECT a.row_id FROM my_table a
JOIN b table_1 ON (b.row_id = a.fld_1)
JOIN c table_2 ON (c.row_id = a.fld_2)
JOIN d table_3 ON (d.row_id = a.fld_3)
JOIN e table_4 ON (e.row_id = a.fld_4)
WHERE a.deleted_id = 0
ORDER BY a.tran_date DESC LIMIT 1;

Regards
Charles

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Frank Millman 2020-05-26 09:32:47 Re: Slow SELECT
Previous Message Charles Clavadetscher 2020-05-26 09:10:45 Re: Slow SELECT