Re: Help with slow table update

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Pawel Veselov <pawel(dot)veselov(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Help with slow table update
Date: 2015-04-14 22:29:26
Message-ID: 552D94C6.4030004@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 4/14/15 4:44 PM, Pawel Veselov wrote:
> On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com
> <mailto:Jim(dot)Nasby(at)bluetreble(dot)com>> wrote:
>
> On 4/14/15 1:28 PM, Pawel Veselov wrote:
>
>
> I wonder if what I need to do, considering that I update a lot
> of "the
> same" rows as I process this queue, is to create a temp table,
> update
> the rows there, and then update the actual tables once at the end...
>
>
> That's what I'd do.
>
>
> Well, in short, I changed (repeat the body of loop for how many tables
> are there)
>
> LOOP (item)
> UPDATE table with item
> IF not found INSERT item INTO table; END IF;
> END LOOP;
>
> to:
>
> CREATE TEMP TABLE xq_table (like table) on commit drop;
> LOOP (item)
> LOOP
> UPDATE xq_table with item;
> exit when found;
> INSERT INTO xq_table select * from table for update;
> continue when found;
> INSERT item INTO xq_table;
> exit;
> END LOOP;
> END LOOP;
> UPDATE table a set (rows) = (xq.rows)
> FROM xq_table xq
> WHERE (a.keys) = (xq.keys)
>
> That works significantly faster. The final update statement is very
> fast. The process is somewhat slow in the beginning as it sucks in
> records from "total" into "xq_total", but once all of that is moved into
> the temp table, it rushes through the rest.

Databases like to think in sets. It will generally be more efficient to
do set operations instead of a bunch of row-by-row stuff.

Since you're pulling all of this from some other table your best bet is
probably something like:

CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total

> The other option would be to use a constraint trigger paired with a
> per-row trigger on the hourly table to drive the daily table, and on
> the daily table to drive the total table. The way that would work is
> the per-row table would simply keep track of all the unique records
> that were changed in a statement (presumably by putting them in a
> temp table). Once the statement is "done", the constraint trigger
> would fire; it would summarize all the changed data and do a much
> smaller number of updates to the table being summarized into.
>
>
> I'm not sure how I would be able to avoid the same number of changes on
> the total table, trigger would fire on each update, won't it? So, same
> problem with a lot of changes on a table...

The difference is that you'd be doing plain INSERTs into a temp table
and then summarizing that. That's going to be a LOT more efficient than
a slew of updates on an existing table.

> BTW, you also made a comment about not having to hit the table if
> you look at something in an index. You can only do that if all the
> data you need is in the index, AND the page with the record is
> marked as being all-visible (google for Postgres Visibility Map). If
> that's not the case then you still have to pull the row in the table
> in, in order to determine visibility. The only case where you can
> still avoid hitting the table is something like a NOT EXISTS; if you
> can't find any entries in the index for something then they
> definitely won't be in the table.
>
>
> What I was saying is that if a table has a unique index, and there is
> cached fact that a particular index value points to a particular row,
> there shouldn't be a need to re-scan the index again to search for any
> more matching values (which would be necessary if the index was not
> unique). Again, all considering the size of the index, the amount of
> different index values that are being queried, etc.

It still has to rescan because of visibility concerns.

> But remember that if you update or delete a row, removing it from an
> index, the data will stay in that index until vacuum comes along.
>
> Also, there's no point in doing a REINDEX after a VACUUM FULL;
> vacuum full rebuilds all the indexes for you.
>
>
> I was being desperate :)
>
> I still think there is something very wrong with this particular table.
> First, I have production systems that employ this function on way larger
> data set, and there is no problem (so far, but still). This machine is
> part of a test deployment, there is no constant load, the only data that
> is being written now is when I do these tests. Vacuuming should prune
> all that dead stuff, and if it's absent, it's unclear where is the time
> spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by
the huge improvement VACUUM FULL made.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pawel Veselov 2015-04-15 00:01:18 Re: Help with slow table update
Previous Message Pawel Veselov 2015-04-14 21:44:08 Re: Help with slow table update