Re: Adding new field to big table

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: acanada <acanada(at)cnio(dot)es>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Adding new field to big table
Date: 2014-03-18 06:04:23
Message-ID: CAMkU=1zp9C1gdfc-oqABW9tLMFXvUikN_ZYAEHeE8NqSADuriQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Monday, March 17, 2014, acanada <acanada(at)cnio(dot)es> wrote:

> Hello,
>
> Jeff and Jeffrey thank you for your tips.
> This is the explain of the query:
> x=> explain update document as d set "svmConfidence" = st.svmconfidence
> from svm_confidence_id_tmp as st where st.id = d.id;
> QUERY PLAN
>
>
> ---------------------------------------------------------------------------------------------------------
> Update on document d (cost=4204242.82..61669685.86 rows=124515592
> width=284)
> -> Hash Join (cost=4204242.82..61669685.86 rows=124515592 width=284)
> Hash Cond: (d.id = st.id)
> -> Seq Scan on document d (cost=0.00..8579122.97 rows=203066697
> width=270)
> -> Hash (cost=1918213.92..1918213.92 rows=124515592 width=18)
> -> Seq Scan on svm_confidence_id_tmp st
> (cost=0.00..1918213.92 rows=124515592 width=18)
> (6 rows)
>
> It's not using the index, most of the rows are beeing updated.
> I'm trying with the CTAS solution.
>

Once this hash join spills to disk, the performance is going to get very
bad. The problem is that the outer table is going to get split into
batches and written to disk. If this were just a select, that would not be
a problem because when it reads each batch back in, that is all it needs to
do as the temp file contains all the necessary info. But with an update,
each batch that it reads back in and matches to the inner side, it then
needs to back to the physical table to do the update, using the ctid saved
in the batches to find the table tuple. So in effect this adds a nested
loop from the hashed copy of the table to the real copy, and the locality
of reference between those is poor when there are many batches. I'm pretty
sure that the extra cost of doing this look up is not taken into account by
the planner. But, if it chooses a different plan than a hash join, that
other plan might also have the same problem.

Some things for you to consider, other than CTAS:

1) Are you analyzing your temporary table before you do the update? That
might switch it to a different plan.

2) Make work_mem as large as you can stand, just for the one session that
runs the update, to try to avoid spilling to disk.

3) If you set enable_hashjoin off temporarily in this session, what plan do
you get?

0) Why are you creating the temporary table? You must have some process
that comes up with the value for the new column to put in the temporary
table, why not just stick it directly into the original table?

Some things for the PostgreSQL hackers to consider:

1) When the hash spills to disk, it seems to write to disk the entire row
that is going to be updated (except for the one column which is going to be
overwritten) plus that tuple's ctid. It doesn't seem like this is
necessary, it should only need to write the ctid and the join key (and
perhaps any quals?). Since it has to visit the old row anyway to set its
cmax, it can pull out the rest of the data to make the new tuple while it
is there. If it wrote a lot less data to the temp tables it could make a
lot less batches for the same work_mem, and here the cost is directly
proportional to the number of batches. (Also, for the table not being
updated, it writes the ctid to temp space when there seems to be no use for
it.)

2) Should the planner account for the scattered reads needed to join to the
original table on ctid for update from whatever materialized version of the
table is created? Of course all other plans would also need to be
similarly instrumented. I think most of them would have the same problem
as the hash_join. The one type I can think of that doesn't would be a
merge join in which there is strong correlation between the merge key and
the ctid order on the table to be updated.

3) It seems like the truly efficient way to run such an update on a very
large data set would be join the two tables (hash or merge), then sort the
result on the ctid of the updated table, then do a "merge join" between
that sorted result and the physical table. I don't think such a method
currently is known to the planner, is it? How hard would it be to make it?

The example I have been using is:

alter table pgbench_accounts add filler2 text;
create table foo as select aid, md5(aid::text) from pgbench_accounts;
analyze;
explain (verbose) update pgbench_accounts set filler2 =md5 from foo where
pgbench_accounts.aid=foo.aid;

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vincent 2014-03-18 21:20:26 Re: Help me understand why my subselect is an order of magnitude faster than my nested joins
Previous Message acanada 2014-03-17 09:39:09 Re: Adding new field to big table