From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Extent Locks |
Date: | 2013-05-17 00:06:57 |
Message-ID: | CA+TgmoahdZ_UF2jv3RmZwgLvmDPcF-Q_GauXuF94eQ-s_1oS4w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 15, 2013 at 8:54 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Starting a new thread to avoid hijacking Heikki's original, but..
>
> * Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
>> Truncating a heap at the end of vacuum, to release unused space back to
>> the OS, currently requires taking an AccessExclusiveLock. Although
>> it's only held for a short duration, it can be enough to cause a
>> hiccup in query processing while it's held. Also, if there is a
>> continuous stream of queries on the table, autovacuum never succeeds
>> in acquiring the lock, and thus the table never gets truncated.
>
> Extent locking suffers from very similar problems and we really need
> to improve this situation. With today's fast i/o systems, and massive
> numbers of CPUs in a single system, it's absolutely trivial to have a
> whole slew of processes trying to add data to a single relation and
> that access getting nearly serialized due to everyone waiting on the
> extent lock.
>
> Perhaps one really simple approach would be to increase the size of
> the extent which is created in relation to the size of the relation.
> I've no clue what level of effort is involved there but I'm hoping
> such an approach would help. I've long thought that it'd be very neat
> if we could simply give each bulk-inserter process their own 1G chunk
> to insert directly into w/o having to talk to anyone else. The
> creation of the specific 1G piece could, hopefully, be made atomic
> easily (either thanks to the OS or with our own locking), etc, etc.
>
> I'm sure it's many bricks shy of a load, but I wanted to raise the
> issue, again, as I've seen it happening on yet another high-volume
> write-intensive system.
I think you might be confused, or else I'm confused, because I don't
believe we have any such thing as an extent lock. What we do have is
a relation extension lock, but the size of the segment on disk has
nothing to do with that: there's only one for the whole relation, and
you hold it when adding a block to the relation. The organization of
blocks into 1GB segments happens at a much lower level of the system,
and is completely disconnected from the locking subsystem. So
changing the segment size wouldn't help with this problem, and would
actually be quite difficult to do, because everything in the system
except at the very lowermost layer just knows about block numbers and
has no idea what "extent" the block is in.
But that having been said, it just so happens that I was recently
playing around with ways of trying to fix the relation extension
bottleneck. One thing I tried was: every time a particular backend
extends the relation, it extends the relation by more than 1 block at
a time before releasing the relation extension lock. Then, other
backends can find those blocks in the free space map instead of having
to grab the relation extension lock, so the number of acquire/release
cycles on the relation extension lock goes down. This does help...
but at least in my tests, extending by 2 blocks instead of 1 was the
big winner, and after that you didn't get much further relief.
Another thing I tried was pre-extending the relation to the estimated
final size. That worked a lot better, and might be worth doing (e.g.
ALTER TABLE zorp SET MINIMUM SIZE 1GB) but a less manual solution
would be preferable if we can come up with one.
After that, I ran out of time for investigation.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2013-05-17 00:11:11 | Re: counting algorithm for incremental matview maintenance |
Previous Message | Greg Smith | 2013-05-17 00:05:31 | Re: fallocate / posix_fallocate for new WAL file creation (etc...) |