From: | Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) |
Date: | 2017-08-13 15:05:24 |
Message-ID: | 20170813180524.41a8bdd7@falcon-work |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
В Fri, 11 Aug 2017 14:05:08 -0400
Robert Haas <robertmhaas(at)gmail(dot)com> пишет:
> On Thu, Aug 10, 2017 at 11:12 AM, Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > These results look very cool!
> > I think CSN is eventually inevitable, but it's a long distance
> > feature. Thus, this optimization could make us a good serve before
> > we would have CSN. Do you expect there are cases when this patch
> > causes slowdown? What about case when we scan each xip array only
> > once (not sure how to simulate that using pgbench)?
>
> Just a random thought here:
>
> The statements pgbench executes are pretty simple: they touch one row
> in one table. You wouldn't expect them to scan the xip array very
> many times. If even those statements touch the array enough for this
> to win, it seems like it might be hard to construct an even worse
> case. I might be missing something, though.
With zipfian distribution, many concurrent transactions tries to update
the same row. Every transaction eventually updates this row (may be
after waiting), so there are a lot of concurrent row version, and
transaction scans through its snapshot many times.
>
> We're not going to accept code like this, though:
>
> + d = xip[i] >> 6;
> + j = k = xip[i] & mask;
> + while (xiphash[j] != InvalidTransactionId)
> + {
> + j = (k + d) & mask;
> + d = d*5 + 1;
> + }
>
> Single-character variable names, hard-coded constants, and no
> comments!
I totally agree. First version (which you've cited) was clear POC.
Second version (I've sent in Thursday) looks a bit better, but I agree
there is room for improvement. I'll try to make it prettier.
BTW, I have a question: there is CopySnapshot function, so snapshot is
clearly changed after copy. But I could not follow: is xip and subxip
array changes in a copy, or it remains the same to original, but only
other snapshot fields could be changed?
> I kind of doubt as a general point that we really want another
> open-coded hash table -- I wonder if this could be made to use
> simplehash.
I like simplehash very much (although I'm missing couple of features).
Unfortunately, siplehash is not simple enough for this use case:
- it will use at least twice more memory for hash table itself (cause
it have to have 'entry->status' field),
- it has large header (ad-hoc hash-table consumes 1 byte in
SnapshotData structure),
- its allocation will be trickier (custom hash-table is co-allocated
after xip-array itself),
- its insert algorithm looks to be much more expensive (at least, it is
more complex in instructions count).
I thing, using simplehash here will lead to much more invasive patch,
than this ad-hoc table. And I believe it will be slower (and this place
is very performance critical), though I will not bet on.
BTW, ad-hoc hash tables already exist: at least recourse owner uses one.
--
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2017-08-13 15:12:30 | Re: initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery |
Previous Message | Andreas Seltenreich | 2017-08-13 13:48:00 | Re: Server crash (FailedAssertion) due to catcache refcount mis-handling |