Re: XLogInsert scaling, revisited

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XLogInsert scaling, revisited
Date: 2013-07-02 16:48:40
Message-ID: 51D30468.9060704@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.06.2013 15:27, Andres Freund wrote:
> Hi,
>
> On 2013-06-26 18:52:30 +0300, Heikki Linnakangas wrote:
>>> * Could you document the way slots prevent checkpoints from occurring
>>> when XLogInsert rechecks for full page writes? I think it's correct -
>>> but not very obvious on a glance.
>>
>> There's this in the comment near the top of the file:
>>
>> * To update RedoRecPtr or fullPageWrites, one has to make sure that all
>> * subsequent inserters see the new value. This is done by reserving all the
>> * insertion slots before changing the value. XLogInsert reads RedoRecPtr
>> and
>> * fullPageWrites after grabbing a slot, so by holding all the slots
>> * simultaneously, you can ensure that all subsequent inserts see the new
>> * value. Those fields change very seldom, so we prefer to be fast and
>> * non-contended when they need to be read, and slow when they're changed.
>>
>> Does that explain it well enough? XLogInsert holds onto a slot while it
>> rechecks for full page writes.
>
> Yes. Earlieron that was explained in XLogInsert() - maybe point to the
> documentation ontop of the file? The file is too big to expect everyone
> to reread the whole file in a new release...
>
>>> * The read of Insert->RedoRecPtr while rechecking whether it's out of
>>> date now is unlocked, is that correct? And why?
>
>> Same here, XLogInsert holds the slot while rechecking Insert->RedoRecptr.
>
> I was wondering whether its guaranteed that we don't read a cached
> value, but I didn't think of the memory barriers due to the spinlock in
> Release/AcquireSlot. I think I thought that release wouldn't acquire the
> spinlock at all.
>
>>> * Have you measured whether it works to just keep as many slots as
>>> max_backends requires around and not bothering with dynamically
>>> allocating them to inserters?
>>> That seems to require to keep actually waiting slots in a separate
>>> list which very well might make that too expensive.
>>>
>>> Correctness wise the biggest problem for that probably is exlusive
>>> acquiration of all slots CreateCheckpoint() does...
>>
>> Hmm. It wouldn't be much different, each backend would still need to reserve
>> its own dedicated slot, because it might be held by the a backend that
>> grabbed all the slots. Also, bgwriter and checkpointer need to flush the
>> WAL, so they'd need slots too.
>>
>> More slots make WaitXLogInsertionsToFinish() more expensive, as it has to
>> loop through all of them. IIRC some earlier pgbench tests I ran didn't show
>> much difference in performance, whether there were 40 slots or 100, as long
>> as there was enough of them. I can run some more tests with even more slots
>> to see how it behaves.
>
> In a very quick test I ran previously I did see the slot acquiration in
> the profile when using short connections that all did some quick inserts
> - which isn't surprising since they tend to all start with the same
> slot so they will all fight for the first slots.
>
> Maybe we could ammeliorate that by initializing MySlotNo to
> (MyProc->pgprocno % num_xloginsert_slots) when uninitialized? That won't
> be completely fair since the number of procs won't usually be a multiple
> of num_insert_slots, but I doubt that will be problematic.
>
>>> * What about using some sort of linked list of unused slots for
>>> WALInsertSlotAcquireOne? Everytime we're done we put it to the *end*
>>> of the list so it's less likely to have been grabbed by somebody else
>>> so we can reuse it.
>>> a) To grab a new one go to the head of the linked list spinlock it and
>>> recheck whether it's still free. If not, restart. Otherwise, remove
>>> from list.
>>> b) To reuse a previously used slot
>>>
>>> That way we only have to do the PGSemaphoreLock() dance if there
>>> really aren't any free slots.
>>
>> That adds a spinlock acquisition / release into the critical path, to
>> protect the linked list. I'm trying very hard to avoid serialization points
>> like that.
>
> Hm. We already have at least one spinlock in that path? Or are you
> thinking of a global one protecting the linked list? If so, I think we
> should be able to get away with locking individual slots.
> IIRC if you never need to delete list elements you can even get away
> with a lockless implementation.
>
>> While profiling the tests I've done, finding a free slot hasn't shown up
>> much. So I don't think it's a problem performance-wise as it is, and I don't
>> think it would make the code simpler.
>
> It sure wouldn't make it simpler. As I said above, I saw the slot
> acquiration in a profile when using -C and a short pgbench script (that
> just inserts 10 rows).
>
>>> * The queuing logic around slots seems to lack documentation. It's
>>> complex enough to warrant that imo.
>>
>> Yeah, it's complex. I expanded the comments above XLogInsertSlot, to explain
>> how xlogInsertingAt works. Did that help, or was it some other part of that
>> that needs more docs?
>
> What I don't understand so far is why we don't reset xlogInsertingAt
> during SlotReleaseOne. There are places documenting that we don't do so,
> but not why unless I missed it.

We could reset it in SlotReleaseOne. However, the next inserter that
acquires the slot would then need to set it to a valid value again, in
SlotAcquireOne. Leaving it at its previous value is a convenient way to
have it pre-initialized with some value for the next SlotAcquireOne call
that uses the same slot.

Perhaps that should be changed, just to make the logic easier to
understand. I agree it seems confusing. Another way to initialize
xlogInsertingAt in SlotAcquireOne would be e.g to use an old value
cached in the same backend. Or just always initialize it to 1, which
would force any WaitXLogInsertionsToFinish() call to wait for the
inserter, until it has finished or updated its value. But maybe that
wouldn't make any difference in practice, and would be less bizarre.

> Do we do this only to have some plausible value for a backend that been
> acquired but hasn't copied data yet?

Yes, exactly.

> If so, why isn't it sufficient to
> initialize it in ReserveXLogInsertLocation?

It would be, but then ReserveXLogInsertLocation would need to hold the
slot's spinlock at the same time with insertpos_lck, so that it could
atomically read the current CurrBytePos value and copy it to
xlogInsertingAt. It's important to keep ReserveXLogInsertLocation() as
lightweight as possible, to maximize concurrency.

Unless you initialize it to 1 in SlotAcquireOne - then can update it
after ReserveXLogInsertLocation() at leisure. But then it might not be
worth the extra spinlock acquisition to update at all, until you finish
the insertion and release the slot (or move to next page), making this
identical to the idea that I pondered above.

I'll do some testing with that. If it performs OK, initializing
xlogInsertingAt to 1 is clearer than leaving it at its old value.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2013-07-02 17:12:31 Re: Eliminating PD_ALL_VISIBLE, take 2
Previous Message Tom Lane 2013-07-02 16:41:21 Re: LATERAL quals revisited