Re: Adding new field to big table

From: RichmondDyes(at)monroehosp(dot)org
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>, pgsql-performance-owner(at)postgresql(dot)org
Subject: Re: Adding new field to big table
Date: 2014-04-23 17:59:07
Message-ID: OF0693CD36.2D544F40-ON85257CC3.0062BF39-85257CC3.0062CC51@mc.rochester.lib.ny.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jeff I think adding the new table is the best way to handle this issue.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: acanada <acanada(at)cnio(dot)es>,
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Date: 03/18/2014 02:05 AM
Subject: Re: [PERFORM] Adding new field to big table
Sent by: pgsql-performance-owner(at)postgresql(dot)org

On Monday, March 17, 2014, acanada <acanada(at)cnio(dot)es> wrote:
Hello,

Jeff and Jeffrey thank you for your tips.
This is the explain of the query:
x=> explain update document as d set "svmConfidence" = st.svmconfidence
from svm_confidence_id_tmp as st where st.id = d.id;
QUERY PLAN

---------------------------------------------------------------------------------------------------------
Update on document d (cost=4204242.82..61669685.86 rows=124515592
width=284)
-> Hash Join (cost=4204242.82..61669685.86 rows=124515592 width=284)
Hash Cond: (d.id = st.id)
-> Seq Scan on document d (cost=0.00..8579122.97 rows=203066697
width=270)
-> Hash (cost=1918213.92..1918213.92 rows=124515592 width=18)
-> Seq Scan on svm_confidence_id_tmp st
(cost=0.00..1918213.92 rows=124515592 width=18)
(6 rows)

It's not using the index, most of the rows are beeing updated.
I'm trying with the CTAS solution.

Once this hash join spills to disk, the performance is going to get very
bad. The problem is that the outer table is going to get split into
batches and written to disk. If this were just a select, that would not
be a problem because when it reads each batch back in, that is all it
needs to do as the temp file contains all the necessary info. But with an
update, each batch that it reads back in and matches to the inner side, it
then needs to back to the physical table to do the update, using the ctid
saved in the batches to find the table tuple. So in effect this adds a
nested loop from the hashed copy of the table to the real copy, and the
locality of reference between those is poor when there are many batches.
I'm pretty sure that the extra cost of doing this look up is not taken
into account by the planner. But, if it chooses a different plan than a
hash join, that other plan might also have the same problem.

Some things for you to consider, other than CTAS:

1) Are you analyzing your temporary table before you do the update? That
might switch it to a different plan.

2) Make work_mem as large as you can stand, just for the one session that
runs the update, to try to avoid spilling to disk.

3) If you set enable_hashjoin off temporarily in this session, what plan
do you get?

0) Why are you creating the temporary table? You must have some process
that comes up with the value for the new column to put in the temporary
table, why not just stick it directly into the original table?

Some things for the PostgreSQL hackers to consider:

1) When the hash spills to disk, it seems to write to disk the entire row
that is going to be updated (except for the one column which is going to
be overwritten) plus that tuple's ctid. It doesn't seem like this is
necessary, it should only need to write the ctid and the join key (and
perhaps any quals?). Since it has to visit the old row anyway to set its
cmax, it can pull out the rest of the data to make the new tuple while it
is there. If it wrote a lot less data to the temp tables it could make a
lot less batches for the same work_mem, and here the cost is directly
proportional to the number of batches. (Also, for the table not being
updated, it writes the ctid to temp space when there seems to be no use
for it.)

2) Should the planner account for the scattered reads needed to join to
the original table on ctid for update from whatever materialized version
of the table is created? Of course all other plans would also need to be
similarly instrumented. I think most of them would have the same problem
as the hash_join. The one type I can think of that doesn't would be a
merge join in which there is strong correlation between the merge key and
the ctid order on the table to be updated.

3) It seems like the truly efficient way to run such an update on a very
large data set would be join the two tables (hash or merge), then sort the
result on the ctid of the updated table, then do a "merge join" between
that sorted result and the physical table. I don't think such a method
currently is known to the planner, is it? How hard would it be to make
it?

The example I have been using is:

alter table pgbench_accounts add filler2 text;
create table foo as select aid, md5(aid::text) from pgbench_accounts;
analyze;
explain (verbose) update pgbench_accounts set filler2 =md5 from foo where
pgbench_accounts.aid=foo.aid;

Cheers,

Jeff

-- Confidentiality Notice --
This email message, including all the attachments, is for the sole use of the intended recipient(s) and contains confidential information. Unauthorized use or disclosure is prohibited. If you are not the intended recipient, you may not use, disclose, copy or disseminate this information. If you are not the intended recipient, please contact the sender immediately by reply email and destroy all copies of the original message,
including attachments.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2014-04-23 18:04:59 Re: HFS+ pg_test_fsync performance
Previous Message Heikki Linnakangas 2014-04-23 13:00:11 Re: tsearch2, large data and indexes