Re: update from performance question

From: "Armand Pirvu (home)" <armand(dot)pirvu(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: update from performance question
Date: 2017-04-19 23:11:06
Message-ID: CAB35198-1D1F-4FB0-A574-338B6DE574E9@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Albe

Thank you for your reply

The query changed a bit

explain (analyze, buffers)
UPDATE
csischema.tf_transaction_item_person TRANS
SET
general_ledger_code = PURCH.general_ledger_code,
general_ledger_code_desc = PURCH.general_ledger_code_desc,
update_datetime = now()::timestamp(0)
FROM
csischema.tf_purchases_person PURCH
WHERE
PURCH.general_ledger_code IS NOT NULL AND
TRANS.purchased_log_id = PURCH.purchased_log_id AND
TRANS.general_ledger_code IS NULL
;

^
select count(*) from csischema.tf_transaction_item_person where general_ledger_code is null;
count
---------
1393515

select count(*) from csischema.tf_transaction_item_person ;
count
---------
3408380

select count(*) from csischema.tf_purchases_person;
count
----------
20760731

select count(*) from csischema.tf_purchases_person where general_ledger_code IS NOT NULL;
count
---------
6909204

But the kicker is this

A select count to see how many records will be used for update gets me zero

select count(trans.purchased_log_id) from
csischema.tf_transaction_item_person TRANS,
csischema.tf_purchases_person PURCH
WHERE
PURCH.general_ledger_code IS NOT NULL AND
TRANS.purchased_log_id = PURCH.purchased_log_id AND
TRANS.general_ledger_code IS NULL
;
count
-------
0
(1 row)

Considering this , I wonder if an index on csischema.tf_purchases_person (purchased_log_id, general_ledger_code) and one on tf_transaction_item_person (purchased_log_id, general_ledger_code) would not help ?

This is what bugs me.

I got the explain out

without indexes

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on tf_transaction_item_person trans (cost=1164684.43..1572235.51 rows=507748 width=227) (actual time=230320.060..230320.060 rows=0 loops=1)
Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
-> Hash Join (cost=1164684.43..1572235.51 rows=507748 width=227) (actual time=230320.054..230320.054 rows=0 loops=1)
Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
Buffers: shared hit=120188 read=876478, temp read=93661 written=93631
-> Seq Scan on tf_transaction_item_person trans (cost=0.00..228945.93 rows=1542683 width=199) (actual time=13.312..52046.689 rows=1393515 loops=1)
Filter: (general_ledger_code IS NULL)
Rows Removed by Filter: 2014865
Buffers: shared read=191731
-> Hash (cost=1012542.32..1012542.32 rows=6833049 width=52) (actual time=152339.000..152339.000 rows=6909204 loops=1)
Buckets: 524288 Batches: 16 Memory Usage: 39882kB
Buffers: shared hit=120188 read=684747, temp written=57588
-> Seq Scan on tf_purchases_person purch (cost=0.00..1012542.32 rows=6833049 width=52) (actual time=8.252..140992.716 rows=6909204 loops=1)
Filter: (general_ledger_code IS NOT NULL)
Rows Removed by Filter: 13851527
Buffers: shared hit=120188 read=684747
Planning time: 0.867 ms
Execution time: 230328.223 ms
(18 rows)

with indexes

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Update on tf_transaction_item_person trans (cost=1161742.22..1567806.87 rows=497927 width=228) (actual time=155171.388..155171.388 rows=0 loops=1)
Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
-> Hash Join (cost=1161742.22..1567806.87 rows=497927 width=228) (actual time=155171.358..155171.358 rows=0 loops=1)
Hash Cond: ((trans.purchased_log_id)::text = (purch.purchased_log_id)::text)
Buffers: shared hit=88095 read=908571, temp read=93661 written=93631
-> Seq Scan on tf_transaction_item_person trans (cost=0.00..228945.93 rows=1542683 width=199) (actual time=16.801..31016.221 rows=1393515 loops=1)
Filter: (general_ledger_code IS NULL)
Rows Removed by Filter: 2014865
Buffers: shared read=191731
-> Hash (cost=1012542.32..1012542.32 rows=6700872 width=53) (actual time=105101.946..105101.946 rows=6909204 loops=1)
Buckets: 524288 Batches: 16 Memory Usage: 39882kB
Buffers: shared hit=88095 read=716840, temp written=57588
-> Seq Scan on tf_purchases_person purch (cost=0.00..1012542.32 rows=6700872 width=53) (actual time=13.823..95970.776 rows=6909204 loops=1)
Filter: (general_ledger_code IS NOT NULL)
Rows Removed by Filter: 13851527
Buffers: shared hit=88095 read=716840
Planning time: 90.409 ms
Execution time: 155179.181 ms
(18 rows)

Thanks
Armand

On Apr 19, 2017, at 3:06 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:

> Armand Pirvu wrote:
>> Running 9.5.2
>>
>> I have the following update and run into a bit of a trouble . I realize the tables
>> involved have quite some data but here goes
>>
>>
>> UPDATE
>> tf_transaction_item_person TRANS
>> SET
>> general_ledger_code = PURCH.general_ledger_code,
>> general_ledger_code_desc = PURCH.general_ledger_code_desc,
>> update_datetime = now()::timestamp(0)
>> FROM
>> tf_purchases_person PURCH
>> WHERE
>> PURCH.general_ledger_code != '' AND
>> TRANS.purchased_log_id = PURCH.purchased_log_id AND
>> TRANS.general_ledger_code != PURCH.general_ledger_code
>> ;
> [...]
>> Table "tf_transaction_item_person"
> [...]
>> Indexes:
>> "tf_transaction_item_person_pkey" PRIMARY KEY, btree (person_transaction_item_id)
>> "tf_tip_idx" btree (client_id, update_datetime)
>> "tf_tip_isdel_idx" btree (show_id, person_transaction_item_id)
>
> You don't show EXPLAIN (ANALYZE, BUFFERS) output for the problematic query,
> so it is difficult to say where the time is spent.
>
> But since you say that the same query without the UPDATE also takes more than
> a minute, the duration for the UPDATE is not outrageous.
> It may well be that much of the time is spent updating the index
> entries for the 3.5 million affected rows.
>
> I don't know if dropping indexes for the duration of the query and recreating
> them afterwards would be a net win, but you should consider it.
>
> It may be that the only ways to improve performance would be general
> things like faster I/O, higher max_wal_size setting, and, most of all,
> enough RAM in the machine to contain the whole database.
>
> Yours,
> Laurenz Albe

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Marco Renzi 2017-04-20 07:19:23 Query with no result set, really really slow adding ORBDER BY / LIMIT clause
Previous Message Albe Laurenz 2017-04-19 08:06:00 Re: update from performance question