Quick Links

Re: Slow SELECT

From:	Frank Millman <frank(at)chagford(dot)com>
To:	Charles Clavadetscher <clavadetscher(at)swisspug(dot)org>
Cc:	Postgres General <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Slow SELECT
Date:	2020-05-26 09:32:47
Message-ID:	2c4834c3-3278-752d-91d6-8aad4efe202a@chagford.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 2020-05-26 11:10 AM, Charles Clavadetscher wrote:
> Hello
>
> On 2020-05-26 10:38, Frank Millman wrote:
>> On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
>>> Hi Frank,
>>>
>>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank(at)chagford(dot)com
>>> <mailto:frank(at)chagford(dot)com>> wrote:
>>>
>>>     Hi all
>>>
>>>     I have a SELECT that runs over 5 times slower on PostgreSQL compared
>>>     with Sql Server and sqlite3. I am trying to understand why.
>>>
>>>     I have a table that looks like this (simplified) -
>>>
>>>     CREATE TABLE my_table (
>>>      row_id SERIAL PRIMARY KEY,
>>>      deleted_id INT DEFAULT 0,
>>>      fld_1 INT REFERENCES table_1(row_id),
>>>      fld_2 INT REFERENCES table_2(row_id),
>>>      fld_3 INT REFERENCES table_3(row_id),
>>>      fld_4 INT REFERENCES table_4(row_id),
>>>      tran_date DATE,
>>>      tran_total DEC(21,2)
>>>      );
>>>
>>>     CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
>>>     fld_4, tran_date) WHERE deleted_id = 0;
>>>
>>>     The table sizes are -
>>>      my_table : 167 rows
>>>      table_1 : 21 rows
>>>      table_2 : 11 rows
>>>      table_3 : 3 rows
>>>      table_4 : 16 rows
>>>
>>>     Therefore for each tran_date in my_table there are potentially
>>>     21x11x3x16 = 11088 rows. Most will be null.
>>>
>>>     I want to select the row_id for the last tran_date for each of those
>>>     potential groups. This is my select -
>>>
>>>      SELECT (
>>>      SELECT a.row_id FROM my_table a
>>>      WHERE a.fld_1 = b.row_id
>>>      AND a.fld_2 = c.row_id
>>>      AND a.fld_3 = d.row_id
>>>      AND a.fld_4 = e.row_id
>>>      AND a.deleted_id = 0
>>>      ORDER BY a.tran_date DESC LIMIT 1
>>>      )
>>>      FROM table_1 b, table_2 c, table_3 d, table_4 e
>>>
>>>     Out of 11088 rows selected, 103 are not null.
>>>
>>>     On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
>>>     sqlite3, and 0.31 sec on PostgreSQL.
>>>
>>>
>>> SQL Server does a good job at caching data in memory. PostgreSQL does
>>> too on consecutive calls to the same table. What execution time do
>>> you get if you issue the query a second time?
>>>
>>> My first guess would be to add an index on my_table.tran_date and
>>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
>>>
>>>     I have looked at the EXPLAIN, but I don't really know what to
>>> look for.
>>>     I can supply it if that would help.
>>>
>>>     Thanks for any advice.
>>>
>>
>> Thanks Olivier. Unfortunately that did not help.
>>
>> I was already running the query twice and only timing the second one.
>>
>> I added the index on tran_date. The timing is the same, and EXPLAIN
>> shows that it is using a SEQUENTIAL SCAN.
>>
>> Here is the EXPLAIN -
>>
>>
>> Nested Loop (cost=0.00..64155.70 rows=11088 width=4)
>>    -> Nested Loop (cost=0.00..10.36 rows=528 width=12)
>>          -> Nested Loop (cost=0.00..2.56 rows=33 width=8)
>>                -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11
>> width=4)
>>                -> Materialize (cost=0.00..1.04 rows=3 width=4)
>>                      -> Seq Scan on table_3 d (cost=0.00..1.03
>> rows=3 width=4)
>>          -> Materialize (cost=0.00..1.24 rows=16 width=4)
>>                -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16
>> width=4)
>>    -> Materialize (cost=0.00..1.31 rows=21 width=4)
>>          -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4)
>>    SubPlan 1
>>      -> Limit (cost=5.77..5.77 rows=1 width=8)
>>            -> Sort (cost=5.77..5.77 rows=1 width=8)
>>                  Sort Key: a.tran_date DESC
>>                  -> Seq Scan on my_table a (cost=0.00..5.76 rows=1
>> width=8)
>>                        Filter: ((fld_1 = b.row_id) AND (fld_2 =
>> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
>> (deleted_id = 0))
>>
>>
>> Frank
>
> If I see it correct, the query runs sequential scans on all tables, i.e.
> table_1 to table_4.
> Do you have an index on the referenced keys (row_id) in table_1 to table_4?
>
> It happens often that referenced keys are not indexed, leading to poor
> execution plans.
>

The referenced keys are all defined as SERIAL PRIMARY KEY in their own
tables, so I presume that that are all indexed automatically.

On the other hand, there are not many rows in those tables, so the
planner may decide not to use the index in that case.

Frank

In response to

Re: Slow SELECT at 2020-05-26 09:10:45 from Charles Clavadetscher

Browse pgsql-general by date

	From	Date	Subject
Next Message	Christian Ramseyer	2020-05-26 10:02:59	Re: Slow SELECT
Previous Message	Charles Clavadetscher	2020-05-26 09:27:12	Re: Slow SELECT