From: | "Armand Pirvu (home)" <armand(dot)pirvu(at)gmail(dot)com> |
---|---|
To: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: update from performance question |
Date: | 2017-04-19 23:11:06 |
Message-ID: | CAB35198-1D1F-4FB0-A574-338B6DE574E9@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Albe
Thank you for your reply
The query changed a bit
explain (analyze, buffers)
UPDATE
csischema.tf_transaction_item_person TRANS
SET
general_ledger_code = PURCH.general_ledger_code,
general_ledger_code_desc = PURCH.general_ledger_code_desc,
update_datetime = now()::timestamp(0)
FROM
csischema.tf_purchases_person PURCH
WHERE
PURCH.general_ledger_code IS NOT NULL AND
TRANS.purchased_log_id = PURCH.purchased_log_id AND
TRANS.general_ledger_code IS NULL
;
^
select count(*) from csischema.tf_transaction_item_person where general_ledger_code is null;
count
---------
1393515
select count(*) from csischema.tf_transaction_item_person ;
count
---------
3408380
select count(*) from csischema.tf_purchases_person;
count
----------
20760731
select count(*) from csischema.tf_purchases_person where general_ledger_code IS NOT NULL;
count
---------
6909204
But the kicker is this
A select count to see how many records will be used for update gets me zero
select count(trans.purchased_log_id) from
csischema.tf_transaction_item_person TRANS,
csischema.tf_purchases_person PURCH
WHERE
PURCH.general_ledger_code IS NOT NULL AND
TRANS.purchased_log_id = PURCH.purchased_log_id AND
TRANS.general_ledger_code IS NULL
;
count
-------
0
(1 row)
Considering this , I wonder if an index on csischema.tf_purchases_person (purchased_log_id, general_ledger_code) and one on tf_transaction_item_person (purchased_log_id, general_ledger_code) would not help ?
This is what bugs me.
I got the explain out
without indexes
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on tf_transaction_item_person trans (cost=1164684.43..1572235.51 rows=507748 width=227) (actual time=230320.060..230320.060 rows=0 loops=1)
Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
-> Hash Join (cost=1164684.43..1572235.51 rows=507748 width=227) (actual time=230320.054..230320.054 rows=0 loops=1)
Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
-> Seq Scan on tf_transaction_item_person trans (cost=0.00..228945.93 rows=1542683 width=199) (actual time=13.312..52046.689 rows=1393515 loops=1)
Filter: (general_ledger_code IS NULL)
Rows Removed by Filter: 2014865
Buffers: shared read=191731
-> Hash (cost=1012542.32..1012542.32 rows=6833049 width=52) (actual time=152339.000..152339.000 rows=6909204 loops=1)
Buckets: 524288 Batches: 16 Memory Usage: 39882kB
Buffers: shared hit=120188 read=684747, temp written=57588
-> Seq Scan on tf_purchases_person purch (cost=0.00..1012542.32 rows=6833049 width=52) (actual time=8.252..140992.716 rows=6909204 loops=1)
Filter: (general_ledger_code IS NOT NULL)
Rows Removed by Filter: 13851527
Buffers: shared hit=120188 read=684747
Planning time: 0.867 ms
Execution time: 230328.223 ms
(18 rows)
with indexes
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on tf_transaction_item_person trans (cost=1161742.22..1567806.87 rows=497927 width=228) (actual time=155171.388..155171.388 rows=0 loops=1)
Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
-> Hash Join (cost=1161742.22..1567806.87 rows=497927 width=228) (actual time=155171.358..155171.358 rows=0 loops=1)
Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
-> Seq Scan on tf_transaction_item_person trans (cost=0.00..228945.93 rows=1542683 width=199) (actual time=16.801..31016.221 rows=1393515 loops=1)
Filter: (general_ledger_code IS NULL)
Rows Removed by Filter: 2014865
Buffers: shared read=191731
-> Hash (cost=1012542.32..1012542.32 rows=6700872 width=53) (actual time=105101.946..105101.946 rows=6909204 loops=1)
Buckets: 524288 Batches: 16 Memory Usage: 39882kB
Buffers: shared hit=88095 read=716840, temp written=57588
-> Seq Scan on tf_purchases_person purch (cost=0.00..1012542.32 rows=6700872 width=53) (actual time=13.823..95970.776 rows=6909204 loops=1)
Filter: (general_ledger_code IS NOT NULL)
Rows Removed by Filter: 13851527
Buffers: shared hit=88095 read=716840
Planning time: 90.409 ms
Execution time: 155179.181 ms
(18 rows)
Thanks
Armand
On Apr 19, 2017, at 3:06 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:
> Armand Pirvu wrote:
>> Running 9.5.2
>>
>> I have the following update and run into a bit of a trouble . I realize the tables
>> involved have quite some data but here goes
>>
>>
>> UPDATE
>> tf_transaction_item_person TRANS
>> SET
>> general_ledger_code = PURCH.general_ledger_code,
>> general_ledger_code_desc = PURCH.general_ledger_code_desc,
>> update_datetime = now()::timestamp(0)
>> FROM
>> tf_purchases_person PURCH
>> WHERE
>> PURCH.general_ledger_code != '' AND
>> TRANS.purchased_log_id = PURCH.purchased_log_id AND
>> TRANS.general_ledger_code != PURCH.general_ledger_code
>> ;
> [...]
>> Table "tf_transaction_item_person"
> [...]
>> Indexes:
>> "tf_transaction_item_person_pkey" PRIMARY KEY, btree (person_transaction_item_id)
>> "tf_tip_idx" btree (client_id, update_datetime)
>> "tf_tip_isdel_idx" btree (show_id, person_transaction_item_id)
>
> You don't show EXPLAIN (ANALYZE, BUFFERS) output for the problematic query,
> so it is difficult to say where the time is spent.
>
> But since you say that the same query without the UPDATE also takes more than
> a minute, the duration for the UPDATE is not outrageous.
> It may well be that much of the time is spent updating the index
> entries for the 3.5 million affected rows.
>
> I don't know if dropping indexes for the duration of the query and recreating
> them afterwards would be a net win, but you should consider it.
>
> It may be that the only ways to improve performance would be general
> things like faster I/O, higher max_wal_size setting, and, most of all,
> enough RAM in the machine to contain the whole database.
>
> Yours,
> Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Marco Renzi | 2017-04-20 07:19:23 | Query with no result set, really really slow adding ORBDER BY / LIMIT clause |
Previous Message | Albe Laurenz | 2017-04-19 08:06:00 | Re: update from performance question |