From: | Jesse Long <jpl(at)unknown(dot)za(dot)net> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Incorrect index being used |
Date: | 2013-10-09 10:57:51 |
Message-ID: | 525536AF.6000909@unknown.za.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 09/10/2013 12:10, Albe Laurenz wrote:
> Jesse Long wrote:
>> I have the following query, run immediately after executing VACUUM in
>> the database. There is only one connection to the database.
> You should run ANALYZE, not VACUUM.
>
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans and 4 are seqscan, and they are used.
>>
>> How can I get PostgreSQL to use subplan 1 and 3?
> They are only possible if an "Index Only Scan" is possible, which
> can only be used if the respective table entries are visible for
> all transactions.
>
>> testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND
>> r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1
>> WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS (
>> SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID
>> AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;
> [...]
>
>> Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed
>> SubPlan 4))
>>
>> Rows Removed by Filter: 710851
>>
>> SubPlan 1
>>
>> -> Index Only Scan using archive_document_index_x_archive_id_node_id_value on
>> archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed)
>>
>> Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
>> 'BSH70002152'::text))
>>
>> Heap Fetches: 0
>>
>> SubPlan 2
>>
>> -> Seq Scan on archive_document_index r1_1 (cost=0.00..1958104.00 rows=1520 width=16)
>> (actual time=44418.383..44558.293 rows=4 loops=1)
>>
>> Filter: ((value)::text = 'BSH70002152'::text)
>>
>> Rows Removed by Filter: 95009919
>>
>> SubPlan 3
>>
>> -> Index Only Scan using archive_document_index_x_archive_id_node_id_value on
>> archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed)
>>
>> Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
>> 'TC212592'::text))
>>
>> Heap Fetches: 0
>>
>> SubPlan 4
>>
>> -> Seq Scan on archive_document_index r2_1 (cost=0.00..1958104.00 rows=1520 width=16)
>> (actual time=41659.464..41663.342 rows=1 loops=1)
>>
>> Filter: ((value)::text = 'TC212592'::text)
>>
>> Rows Removed by Filter: 95009922
> The estimates are quite off.
> Does "ANALYZE archive_document", possibly after increasing
> default_statistics_target, make a difference?
>
> Yours,
> Laurenz Albe
>
Hi Laurenz,
Thank you for the feedback.
There is no problem with row visibility, there is only one connection to
the database - the connection I am using to do these selects.
Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
tables concerned, but not much changed:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..151.85 rows=10 width=122) (actual time=40841.984..85668.213 rows=2 loops=1)
-> Index Scan Backward using idx_archive_document_x_node_id_archive_date on archive_document r0 (cost=0.56..7627640.20 rows=504186 width=122) (actual time=40841.98
Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08 18:28:00'::timestamp without time zone))
Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))
Rows Removed by Filter: 710851
SubPlan 1
-> Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed)
Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text))
Heap Fetches: 0
SubPlan 2
-> Seq Scan on archive_document_index r1_1 (cost=0.00..1958101.80 rows=1568 width=16) (actual time=36633.365..40841.909 rows=4 loops=1)
Filter: ((value)::text = 'BSH70002152'::text)
Rows Removed by Filter: 95009919
SubPlan 3
-> Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed)
Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text))
Heap Fetches: 0
SubPlan 4
-> Seq Scan on archive_document_index r2_1 (cost=0.00..1958101.80 rows=1568 width=16) (actual time=40241.599..44462.485 rows=1 loops=1)
Filter: ((value)::text = 'TC212592'::text)
Rows Removed by Filter: 95009922
Total runtime: 85676.734 ms
(22 rows)
Thanks,
Jesse
From | Date | Subject | |
---|---|---|---|
Next Message | Jesse Long | 2013-10-09 11:02:05 | Re: Incorrect index being used |
Previous Message | Albe Laurenz | 2013-10-09 10:10:01 | Re: Incorrect index being used |