Re: Very slow query (3-4mn) on a table with 25millions rows

From: Abadie Lana <Lana(dot)Abadie(at)iter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow query (3-4mn) on a table with 25millions rows
Date: 2016-07-26 09:01:24
Message-ID: E544BB9A64ABD24DA201745FD316D94551124F74@XCH2.iter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Tom,
Thanks for the hints..

I made various tests for index
The best I could get is the following one with
create index vat_funcvaratt_multi_idx on functionalvarattributes(split_part(split_part(attvalue,' ',1),'.',1), tag_id, atttype_id);
analyze functionalvarattributes;

explain analyze select s.attvalue from functionalvarattributes s, tags t, variableattributetypes vat where t.id=s.tag_id and t.status!='Internal'and vat.fieldtype in ('DBF_INLINK','DBF_OUTLINK','DBF_FWDLINK') and vat.id=s.atttype_id and split_part(split_part(s.attvalue,' ',1),'.',1) in (select e.name from functionalvariables e, usertemplatevariable ut where e.usertemplatevar_id=ut.id and ut.usertempl_id=15) except select s.attvalue from functionalvarattributes s, tags t, usertemplvarattribute utva, usertemplatevariable utv, variableattributetypes vat where vat.fieldtype in ('DBF_INLINK','DBF_OUTLINK','DBF_FWDLINK') and vat.id=s.atttype_id and utv.id=utva.usertempvariable_fk and utv.usertempl_id=15 and t.id=s.tag_id and t.status!='Internal'and split_part(split_part(s.attvalue,' ',1),'.',1) in (select e.name from functionalvariables e, usertemplatevariable ut where e.usertemplatevar_id=ut.id and ut.usertempl_id=15);
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------
HashSetOp Except (cost=171505.51..2361978.74 rows=1116 width=8) (actual time=66476.682..66476.682 rows=0 loops=1)
-> Append (cost=171505.51..2251949.02 rows=44011889 width=8) (actual time=12511.639..66476.544 rows=320 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=171505.51..907368.77 rows=310121 width=8) (actual time=12511.638..31775.404 rows=2 lo
ops=1)
-> Hash Join (cost=171505.51..904267.56 rows=310121 width=8) (actual time=12511.636..31775.401 rows=2 loops=1)
Hash Cond: (split_part(split_part((s.attvalue)::text, ' '::text, 1), '.'::text, 1) = (e.name)::text)
-> Hash Join (cost=193.91..726325.20 rows=310121 width=8) (actual time=1.227..24083.777 rows=308287 loops=1)
Hash Cond: (s.tag_id = t.id)
-> Hash Join (cost=188.03..716951.08 rows=1671210 width=16) (actual time=1.157..23810.490 rows=651155 loop
s=1)
Hash Cond: (s.atttype_id = vat.id)
-> Seq Scan on functionalvarattributes s (cost=0.00..604688.60 rows=25429960 width=24) (actual time=
0.002..15719.449 rows=25429808 loops=1)
-> Hash (cost=183.18..183.18 rows=388 width=8) (actual time=1.116..1.116 rows=388 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
-> Seq Scan on variableattributetypes vat (cost=0.00..183.18 rows=388 width=8) (actual time=0.
005..0.987 rows=388 loops=1)
Filter: ((fieldtype)::text = ANY ('{DBF_INLINK,DBF_OUTLINK,DBF_FWDLINK}'::text[]))
Rows Removed by Filter: 5516
-> Hash (cost=5.43..5.43 rows=36 width=8) (actual time=0.064..0.064 rows=36 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Seq Scan on tags t (cost=0.00..5.43 rows=36 width=8) (actual time=0.008..0.055 rows=36 loops=1)
Filter: ((status)::text <> 'Internal'::text)
Rows Removed by Filter: 158
-> Hash (cost=171250.07..171250.07 rows=4923 width=24) (actual time=7377.344..7377.344 rows=16 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> HashAggregate (cost=171200.84..171250.07 rows=4923 width=24) (actual time=7377.310..7377.329 rows=16 lo
ops=1)
-> Hash Join (cost=8.95..171188.53 rows=4923 width=24) (actual time=3.178..7377.271 rows=48 loops=1)
Hash Cond: (e.usertemplatevar_id = ut.id)
-> Seq Scan on functionalvariables e (cost=0.00..155513.07 rows=4164607 width=32) (actual time
=1.271..5246.277 rows=4164350 loops=1)
-> Hash (cost=8.75..8.75 rows=16 width=8) (actual time=0.026..0.026 rows=16 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using usertemp_utv_idx on usertemplatevariable ut (cost=0.29..8.75 rows=16
width=8) (actual time=0.011..0.020 rows=16 loops=1)
Index Cond: (usertempl_id = 15)
-> Subquery Scan on "*SELECT* 2" (cost=172514.13..1344580.25 rows=43701768 width=8) (actual time=11551.477..34701.030 rows=3
18 loops=1)
-> Hash Join (cost=172514.13..907562.57 rows=43701768 width=8) (actual time=11551.475..34700.876 rows=318 loops=1)
Hash Cond: (split_part(split_part((s_1.attvalue)::text, ' '::text, 1), '.'::text, 1) = (e_1.name)::text)
-> Hash Join (cost=193.91..726325.20 rows=310121 width=8) (actual time=1.281..27733.991 rows=308287 loops=1)
Hash Cond: (s_1.tag_id = t_1.id)
-> Hash Join (cost=188.03..716951.08 rows=1671210 width=16) (actual time=1.194..27391.475 rows=651155 loop
s=1)
Hash Cond: (s_1.atttype_id = vat_1.id)
-> Seq Scan on functionalvarattributes s_1 (cost=0.00..604688.60 rows=25429960 width=24) (actual tim
e=0.001..17189.172 rows=25429808 loops=1)
-> Hash (cost=183.18..183.18 rows=388 width=8) (actual time=1.153..1.153 rows=388 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 16kB
-> Seq Scan on variableattributetypes vat_1 (cost=0.00..183.18 rows=388 width=8) (actual time=
0.007..1.015 rows=388 loops=1)
Filter: ((fieldtype)::text = ANY ('{DBF_INLINK,DBF_OUTLINK,DBF_FWDLINK}'::text[]))
Rows Removed by Filter: 5516
-> Hash (cost=5.43..5.43 rows=36 width=8) (actual time=0.065..0.065 rows=36 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 2kB
-> Seq Scan on tags t_1 (cost=0.00..5.43 rows=36 width=8) (actual time=0.010..0.053 rows=36 loops=1)
Filter: ((status)::text <> 'Internal'::text)
Rows Removed by Filter: 158
-> Hash (cost=172318.46..172318.46 rows=141 width=24) (actual time=6553.620..6553.620 rows=2544 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 134kB
-> Nested Loop (cost=171201.54..172318.46 rows=141 width=24) (actual time=6550.096..6552.789 rows=2544 loo
ps=1)
-> Nested Loop (cost=171201.12..172243.46 rows=16 width=32) (actual time=6550.077..6550.305 rows=256
loops=1)
-> HashAggregate (cost=171200.84..171250.07 rows=4923 width=24) (actual time=6542.508..6542.53
5 rows=16 loops=1)
-> Hash Join (cost=8.95..171188.53 rows=4923 width=24) (actual time=12.705..6542.472 row
s=48 loops=1)
Hash Cond: (e_1.usertemplatevar_id = ut_1.id)
-> Seq Scan on functionalvariables e_1 (cost=0.00..155513.07 rows=4164607 width=32
) (actual time=7.324..5008.051 rows=4164350 loops=1)
-> Hash (cost=8.75..8.75 rows=16 width=8) (actual time=0.033..0.033 rows=16 loops=
1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using usertemp_utv_idx on usertemplatevariable ut_1 (cost=0.29
..8.75 rows=16 width=8) (actual time=0.018..0.026 rows=16 loops=1)
Index Cond: (usertempl_id = 15)
-> Materialize (cost=0.29..8.83 rows=16 width=8) (actual time=0.473..0.478 rows=16 loops=16)
-> Index Scan using usertemp_utv_idx on usertemplatevariable utv (cost=0.29..8.75 rows=1
6 width=8) (actual time=0.032..0.041 rows=16 loops=1)
Index Cond: (usertempl_id = 15)
-> Index Only Scan using usertemplvarattribute_atttypeid_key on usertemplvarattribute utva (cost=0.4
2..4.60 rows=9 width=8) (actual time=0.002..0.004 rows=10 loops=256)
Index Cond: (usertempvariable_fk = utv.id)
Heap Fetches: 0
Total runtime: 66476.942 ms
(67 rows)

Is this acceptable or can I get better results?
Thanks
Lana

>>-----Original Message-----
>>From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>>Sent: 25 July 2016 20:07
>>To: Abadie Lana
>>Cc: pgsql-performance(at)postgresql(dot)org
>>Subject: Re: [PERFORM] Very slow query (3-4mn) on a table with 25millions
>>rows
>>
>>Abadie Lana <Lana(dot)Abadie(at)iter(dot)org> writes:
>>> I'm having a problem with a slow query - I tried several things to optimize the
>>queries but didn't really help. The output of explain analyse shows sequential
>>scan on a table of 25 million rows. Even though it is indexed and (I put a multi-
>>column index on the fields used in the query), the explain utility shows no usage
>>of the scan...
>>
>>That index looks pretty useless judging from the rowcounts, so I'm not surprised
>>that the planner didn't use it. You might have better luck with an index on the
>>split_part expression
>>
>>split_part(split_part((s.attvalue)::text, ' '::text, 1), '.'::text, 1)
>>
>>since it's the join of that to e.name that seems to be actually selective.
>>(The planner doesn't appear to realize that it is, but ANALYZE'ing after creating
>>the index should fix that.)
>>
>> regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Martín Marqués 2016-07-26 10:34:17 Re: Very slow query (3-4mn) on a table with 25millions rows
Previous Message Tom Lane 2016-07-25 18:06:57 Re: Very slow query (3-4mn) on a table with 25millions rows