From: | Charles Clavadetscher <clavadetscher(at)swisspug(dot)org> |
---|---|
To: | Frank Millman <frank(at)chagford(dot)com> |
Cc: | Olivier Gautherot <ogautherot(at)gautherot(dot)net>, Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Slow SELECT |
Date: | 2020-05-26 09:10:45 |
Message-ID: | 40598c3956f865c69d04e8032305fb03@swisspug.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello
On 2020-05-26 10:38, Frank Millman wrote:
> On 2020-05-26 9:32 AM, Olivier Gautherot wrote:
>> Hi Frank,
>>
>> On Tue, May 26, 2020 at 9:23 AM Frank Millman <frank(at)chagford(dot)com
>> <mailto:frank(at)chagford(dot)com>> wrote:
>>
>> Hi all
>>
>> I have a SELECT that runs over 5 times slower on PostgreSQL
>> compared
>> with Sql Server and sqlite3. I am trying to understand why.
>>
>> I have a table that looks like this (simplified) -
>>
>> CREATE TABLE my_table (
>> row_id SERIAL PRIMARY KEY,
>> deleted_id INT DEFAULT 0,
>> fld_1 INT REFERENCES table_1(row_id),
>> fld_2 INT REFERENCES table_2(row_id),
>> fld_3 INT REFERENCES table_3(row_id),
>> fld_4 INT REFERENCES table_4(row_id),
>> tran_date DATE,
>> tran_total DEC(21,2)
>> );
>>
>> CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3,
>> fld_4, tran_date) WHERE deleted_id = 0;
>>
>> The table sizes are -
>> my_table : 167 rows
>> table_1 : 21 rows
>> table_2 : 11 rows
>> table_3 : 3 rows
>> table_4 : 16 rows
>>
>> Therefore for each tran_date in my_table there are potentially
>> 21x11x3x16 = 11088 rows. Most will be null.
>>
>> I want to select the row_id for the last tran_date for each of
>> those
>> potential groups. This is my select -
>>
>> SELECT (
>> SELECT a.row_id FROM my_table a
>> WHERE a.fld_1 = b.row_id
>> AND a.fld_2 = c.row_id
>> AND a.fld_3 = d.row_id
>> AND a.fld_4 = e.row_id
>> AND a.deleted_id = 0
>> ORDER BY a.tran_date DESC LIMIT 1
>> )
>> FROM table_1 b, table_2 c, table_3 d, table_4 e
>>
>> Out of 11088 rows selected, 103 are not null.
>>
>> On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on
>> sqlite3, and 0.31 sec on PostgreSQL.
>>
>>
>> SQL Server does a good job at caching data in memory. PostgreSQL does
>> too on consecutive calls to the same table. What execution time do you
>> get if you issue the query a second time?
>>
>> My first guess would be to add an index on my_table.tran_date and
>> check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table.
>>
>> I have looked at the EXPLAIN, but I don't really know what to look
>> for.
>> I can supply it if that would help.
>>
>> Thanks for any advice.
>>
>
> Thanks Olivier. Unfortunately that did not help.
>
> I was already running the query twice and only timing the second one.
>
> I added the index on tran_date. The timing is the same, and EXPLAIN
> shows that it is using a SEQUENTIAL SCAN.
>
> Here is the EXPLAIN -
>
>
> Nested Loop (cost=0.00..64155.70 rows=11088 width=4)
> -> Nested Loop (cost=0.00..10.36 rows=528 width=12)
> -> Nested Loop (cost=0.00..2.56 rows=33 width=8)
> -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11
> width=4)
> -> Materialize (cost=0.00..1.04 rows=3 width=4)
> -> Seq Scan on table_3 d (cost=0.00..1.03 rows=3
> width=4)
> -> Materialize (cost=0.00..1.24 rows=16 width=4)
> -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16
> width=4)
> -> Materialize (cost=0.00..1.31 rows=21 width=4)
> -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4)
> SubPlan 1
> -> Limit (cost=5.77..5.77 rows=1 width=8)
> -> Sort (cost=5.77..5.77 rows=1 width=8)
> Sort Key: a.tran_date DESC
> -> Seq Scan on my_table a (cost=0.00..5.76 rows=1
> width=8)
> Filter: ((fld_1 = b.row_id) AND (fld_2 =
> c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND
> (deleted_id = 0))
>
>
> Frank
If I see it correct, the query runs sequential scans on all tables, i.e.
table_1 to table_4.
Do you have an index on the referenced keys (row_id) in table_1 to
table_4?
It happens often that referenced keys are not indexed, leading to poor
execution plans.
Bye
Charles
From | Date | Subject | |
---|---|---|---|
Next Message | Charles Clavadetscher | 2020-05-26 09:27:12 | Re: Slow SELECT |
Previous Message | Frank Millman | 2020-05-26 08:38:51 | Re: Slow SELECT |