Re: COPY locking

From: John Coers <coers(at)intrinsity(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: COPY locking
Date: 2001-05-10 14:44:03
Message-ID: 3AFAA933.B01F403A@intrinsity.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> No, they all share the same disk buffer in shared memory for whatever is
> currently the last page of the file. Adding a tuple into that page
> requires grabbing a short-term lock on that page (since obviously only
> one process can do it at a time, else they'd be trying to insert into
> the same physical spot). That lock is released as soon as the tuple is
> physically stored in the buffer, but with enough processes running COPY
> into the same table, you'll get contention for the lock.
Ah so! So after each tuple is written to shared memory there is a giant
scramble for the lock. Explains all the semops and timer sets and sleeps
in the truss output that I've been seeing.


> I suspect the performance issue you're seeing is not so much the need
> for a short-term lock (it's awful hard to see how to do without one,
> anyway) as it is that our current spinlock implementation is pretty bad
> in the face of heavy contention. On most platforms the backoff when
> you can't get the lock on the first try is a ten-millisecond sleep,
> which is huge overkill for locks that are only held for periods of
> microseconds. There was discussion in pghackers a few months ago about
> reimplementing spinlocks in some more modern way (eg, using Posix
> semaphores if available) but no progress has been made yet.

I saw that thread while looking for answers to my problem. Put me in as a
"Yay" vote. My use of this system is different from most I think. I need to cram
huge amounts of data in from multiple clients, then I'll pull it all out and
process it later although there will be an occasional query to monitor progress.
I don't need fancy queries.


> > Are there any suggested techniques or tweaks I can make to avoid this
> > interference?
>
> Do you really need to run multiple COPYs in parallel, or would
> serializing them be just as good? You could serialize them without
> locking out readers by doing
>
> BEGIN;
> LOCK TABLE foo IN EXCLUSIVE MODE;
> COPY foo FROM ...
> END;
>
> "EXCLUSIVE" mode isn't quite as exclusive as a plain LOCK TABLE; it
> still allows other readers. See "Table-level locks" in the User's
> Guide.
>

They don't have to be parallel in the strictest sense. As clients get
finished with jobs they will try to connect and upload data. If serializing
the process makes it faster, then that's what I'll try. Counterintuitive though
it may be. Of course getting my best performance with N=2 was counterintuitive too...

--
John Coers Intrinsity, Inc.
coers(at)intrinsity(dot)com Austin, Texas

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2001-05-10 14:53:01 Re: Re[2]: Vacuudb problem
Previous Message Renaud Thonnart 2001-05-10 14:36:33 Norm SQL?