Quick Links

CTE with JOIN of two tables is much faster than a regular query

From:	<kpi6288(at)gmail(dot)com>
To:	<pgsql-general(at)postgresql(dot)org>
Subject:	CTE with JOIN of two tables is much faster than a regular query
Date:	2018-08-18 09:36:53
Message-ID:	004b01d436d6$ff96c840$fec458c0$@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Running PostgreSQL 9.5 on Windows.

The CTE mentioned below completes the query in 4.5 seconds while the regular
query takes 66 seconds. I read from EXPLAIN ANALYSE that the regular query
starts with a full table scan over "Doc" while the CTE joins the two tables
first and applies the filter condition in the 2nd step.

I believe that some rows in "Doc" which are not referenced by "F" contain a
large amount of data in the field "szText" and this will slow down the ILIKE
operator.

What can I do to improve the performance of the regular query without using
a CTE?

This is a much simplified extract from a larger application:

CREATE TABLE Doc (

oID UUID NOT NULL PRIMARY KEY,

uDocID UUID NOT NULL UNIQUE,

szText TEXT

);

CREATE TABLE F (

oID UUID NOT NULL PRIMARY KEY,

uDocRef UUID,

CONSTRAINT F_fkey1 FOREIGN KEY (uDocRef) REFERENCES Doc (uDocID)

);

-- just in case .

ALTER TABLE Doc ALTER uDocID SET STATISTICS 10000;

ALTER TABLE Doc ALTER szText SET STATISTICS 10000;

VACUUM ANALYSE Doc;

SELECT COUNT(*) FROM Doc;

=> 125946 records

ALTER TABLE F ALTER uDocRef SET STATISTICS 10000;

VACUUM ANALYSE F;

SELECT COUNT(*) FROM F;

=> 32605 records

Result with CTE:

EXPLAIN ANALYSE

WITH a AS (

SELECT F.oID, Doc.szText

FROM F

JOIN Doc ON F.uDocRef = Doc.udocid

)

SELECT *

FROM a

WHERE szText ILIKE '%480GB%';

"CTE Scan on a (cost=9463.42..10197.03 rows=52 width=48) (actual
time=478.770..4551.613 rows=10 loops=1)"

" Filter: (sztext ~~* '%480GB%'::text)"

" Rows Removed by Filter: 32595"

" CTE a"

" -> Hash Join (cost=973.61..9463.42 rows=32605 width=359) (actual
time=36.998..100.337 rows=32605 loops=1)"

" Hash Cond: (doc.udocid = f.udocref)"

" -> Seq Scan on doc (cost=0.00..7691.46 rows=125946 width=359)
(actual time=0.008..18.269 rows=125946 loops=1)"

" -> Hash (cost=566.05..566.05 rows=32605 width=32) (actual
time=35.825..35.825 rows=32605 loops=1)"

" Buckets: 32768 Batches: 1 Memory Usage: 2294kB"

" -> Seq Scan on f (cost=0.00..566.05 rows=32605 width=32)
(actual time=0.005..14.677 rows=32605 loops=1)"

"Planning time: 4.689 ms"

"Execution time: 4554.893 ms"

Result with regular query:

EXPLAIN ANALYSE

SELECT F.oID, Doc.szText

FROM F

JOIN Doc ON F.uDocRef = Doc.udocid

WHERE szText ILIKE '%480GB%';

"Hash Join (cost=8006.56..8694.93 rows=5 width=359) (actual
time=66500.415..66506.978 rows=10 loops=1)"

" Hash Cond: (f.udocref = doc.udocid)"

" -> Seq Scan on f (cost=0.00..566.05 rows=32605 width=32) (actual
time=0.002..3.143 rows=32605 loops=1)"

" -> Hash (cost=8006.32..8006.32 rows=19 width=359) (actual
time=66500.023..66500.023 rows=16 loops=1)"

" Buckets: 1024 Batches: 1 Memory Usage: 19kB"

" -> Seq Scan on doc (cost=0.00..8006.32 rows=19 width=359) (actual
time=8864.720..66499.991 rows=16 loops=1)"

" Filter: (sztext ~~* '%480GB%'::text)"

" Rows Removed by Filter: 125930"

"Planning time: 263.542 ms"

"Execution time: 66507.003 ms"

Responses

Re: CTE with JOIN of two tables is much faster than a regular query at 2018-08-18 10:26:30 from Andreas Kretschmer
Re: CTE with JOIN of two tables is much faster than a regular query at 2018-08-18 14:38:48 from Stephen Frost
Re: CTE with JOIN of two tables is much faster than a regular query at 2018-08-18 16:25:14 from Ravi Krishna
(was: CTE with JOIN of two tables is much faster than a regular query) at 2018-08-20 19:08:54 from Albrecht Dreß

Browse pgsql-general by date

	From	Date	Subject
Next Message	Andreas Kretschmer	2018-08-18 10:26:30	Re: CTE with JOIN of two tables is much faster than a regular query
Previous Message	김세훈	2018-08-18 05:09:06	Re: using graph model with PostgreSQL