Quick Links

Re: Data Warehouse Reevaluation - MySQL vs Postgres --

From:	Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Date:	2004-09-14 22:58:20
Message-ID:	gIK1d.26646$XP3.14809@edtnps84
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Googling 'upsert' (an Oraclism, I believe) will get you hits on Oracle
and DB2's implementation of MERGE, which does what AMOUNTS to what is
described below (one mass UPDATE...FROM, one mass INSERT...WHERE NOT
EXISTS).

No, you shouldn't iterate row-by-row through the temp table.
Whenever possible, try to do updates in one single (mass) operation.
Doing it that way gives the optimizer the best chance at amortizing
fixed costs, and batching operations.

---------
In any database other than Postgres, I would recommend doing the
INSERT /followed by/ the UPDATE. That order looks wonky --- your update
ends up pointlessly operating on the rows just INSERTED. The trick is,
UPDATE acquires and holds write locks (the rows were previously visible
to other processes), while INSERT's write locks refer to rows that no
other process could try to lock.

Stephen Frost wrote:
> * Markus Schaber (schabios(at)logi-track(dot)com) wrote:
>
>>Generally, what is the fastest way for doing bulk processing of
>>update-if-primary-key-matches-and-insert-otherwise operations?
>
>
> This is a very good question, and I havn't seen much of an answer to it
> yet. I'm curious about the answer myself, actually. In the more recent
> SQL specs, from what I understand, this is essentially what the 'MERGE'
> command is for. This was recently added and unfortunately is not yet
> supported in Postgres. Hopefully it will be added soon.
>
> Otherwise, what I've done is basically an update followed by an insert
> using outer joins. If there's something better, I'd love to hear about
> it. The statements looks something like:
>
> update X
> set colA = a.colA,
> colB = a.colB
> from Y a
> where keyA = a.keyA and
> keyB = a.keyB;
>
> insert into X
> select a.keyA,
> a.keyB,
> a.colA,
> a.colB
> from Y a left join X b
> using (keyA, keyB)
> where b.keyA is NULL and
> b.keyB is NULL;
>
> With the appropriate indexes, this is pretty fast but I think a merge
> would be much faster.
>
> Thanks,
>
> Stephen

In response to

Re: Data Warehouse Reevaluation - MySQL vs Postgres -- at 2004-09-14 14:33:03 from Stephen Frost

Responses

Re: Data Warehouse Reevaluation - MySQL vs Postgres -- at 2004-09-16 04:29:04 from Christopher Kings-Lynne
Re: Data Warehouse Reevaluation - MySQL vs Postgres -- at 2004-09-16 10:39:04 from Markus Schaber

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Simon Riggs	2004-09-14 23:32:11	Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Previous Message	Jeffrey W. Baker	2004-09-14 22:48:51	Re: disk performance benchmarks