From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Extent Locks |
Date: | 2013-05-17 03:55:37 |
Message-ID: | 20130517035537.GX4361@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> I think it's pretty unrealistic to suppose that this can be made to
> work. The most obvious problem is that a sequential scan is coded to
> assume that every block between 0 and the last block in the relation
> is worth reading,
You don't change that. However, when a seq scan asks the storage layer
for blocks that it knows don't actually exist, it can simply skip over
them or return "empty" records or something equivilant... Yes, that's
hand-wavy, but I also think it's doable.
> I suspect there are
> slightly less obvious problems that would turn out to be highly
> intractable.
Entirely possible. :)
> The assumption that block numbers are dense is probably
> embedded in the system in a lot of subtle ways; if we start trying to
> change I think we're dooming ourselves to an unending series of crocks
> trying to undo the mess we've created.
Perhaps.
> Also, I think that's really a red herring anyway. Relation extension
> per se is not slow - we can grow a file by adding zero bytes at a
> pretty good clip, and don't really gain anything at the database level
> by spreading the growth across multiple files.
That's true when the file is on a single filesystem and a single set of
drives. Make them be split across multiple filesystems/volumes where
you get more drives involved...
> The problem is the
> relation extension LOCK, and I think that's where we should be
> focusing our attention. I'm pretty confident we can find a way to
> take the pressure off the lock without actually changing anything all
> at the storage layer.
That would certainly be very neat and if possible might render my idea
moot, which I would be more than happy with.
> As a thought experiment, suppose for example
> that we have a background process that knows, by magic, how many new
> blocks will be needed in each relation. And it knows this just enough
> in advance to have time to extend each such relation by the requisite
> number of blocks and add those blocks to the free space map. Since
> only that process ever needs a relation extension lock, there is no
> longer any contention for any such lock. Problem solved!
Sounds cute, but perhaps a bit too cute to be realistic (that's
certainly been my opinion when suggested by others, which is has been,
in the past).
> Actually, I'm not convinced that a background process is the right
> approach at all, and of course there's no actual magic that lets us
> foresee exact extension needs. But I still feel like that thought
> experiment indicates that there must be a solution here just by
> rejiggering the locking, and maybe with a bit of modest pre-extension.
> The mediocre results of my last couple tries must indicate that I
> wasn't entirely successful in getting the backends out of each others'
> way, but I tend to think that's just an indication that I don't
> understand exactly what's happening in the contention scenarios yet,
> rather than a fundamental difficulty with the approach.
Perhaps.
> > How many concurrent writers did you have and what kind of filesystem was
> > backing this? Was it a temp filesystem where writes are essentially to
> > memory, causing this relation extention lock to be much more
> > contentious?
>
> 10. ext4. No.
Ok.
> If I took 30 seconds to pre-extend the relation before writing any
> data into it, then writing the data went pretty much exactly 10 times
> faster with 10 writers than with 1.
That's rather fantastic..
> But small on-the-fly
> pre-extensions during the write didn't work as well. I don't remember
> exactly what formulas I tried, but I do remember that the few I tried
> were not really any better than "always pre-extend by 1 extra block";
> and that alone eliminated about half the contention, but then I
> couldn't do better.
That seems quite odd to me- I would have thought extending by more than
2 blocks would have helped with the contention. Still, it sounds like
extending requires a fair bit of writing, and that sucks in its own
right because we're just going to rewrite that- is that correct? If so,
I like proposal even more...
> I wonder if I need to use LWLockAcquireOrWait().
I'm not seeing how/why that might help?
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2013-05-17 04:06:46 | Re: Better handling of archive_command problems |
Previous Message | Robert Haas | 2013-05-17 03:38:31 | Re: Extent Locks |