From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: pgsql: Compute XID horizon for page level index vacuum on primary. |
Date: | 2019-03-29 15:58:14 |
Message-ID: | CANP8+jLEWNQX9oW0RQPPvOXFOh3zEBUdC62QWZ2GLNkeZmXnPA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Fri, 29 Mar 2019 at 15:29, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2019-03-29 09:37:11 +0000, Simon Riggs wrote:
>
> > While trying to understand this, I see there is an even better way to
> > optimize this. Since we are removing dead index tuples, we could alter
> the
> > killed index tuple interface so that it returns the xmax of the tuple
> being
> > marked as killed, rather than just a boolean to say it is dead.
>
> Wouldn't that quite possibly result in additional and unnecessary
> conflicts? Right now the page level horizon is computed whenever the
> page is actually reused, rather than when an item is marked as
> deleted. As it stands right now, the computed horizons are commonly very
> "old", because of that delay, leading to lower rates of conflicts.
>
I wasn't suggesting we change when the horizon is calculated, so no change
there.
The idea was to cache the data for later use, replacing the hint bit with a
hint xid.
That won't change the rate of conflicts, up or down - but it does avoid I/O.
> > Indexes can then mark the killed tuples with the xmax that killed them
> > rather than just a hint bit. This is possible since the index tuples
> > are dead and cannot be used to follow the htid to the heap, so the
> > htid is redundant and so the block number of the tid could be
> > overwritten with the xmax, zeroing the itemid. Each killed item we
> > mark with its xmax means one less heap fetch we need to perform when
> > we delete the page - it's possible we optimize that away completely by
> > doing this.
>
> That's far from a trivial feature imo. It seems quite possible that we'd
> end up with increased overhead, because the current logic can get away
> with only doing hint bit style writes - but would that be true if we
> started actually replacing the item pointers? Because I don't see any
> guarantee they couldn't cross a page boundary etc? So I think we'd need
> to do WAL logging during index searches, which seems prohibitively
> expensive.
>
Don't see that.
I was talking about reusing the first 4 bytes of an index tuple's
ItemPointerData,
which is the first field of an index tuple. Index tuples are MAXALIGNed, so
I can't see how that would ever cross a page boundary.
> And I'm also doubtful it's worth it because:
>
> > Since this point of the code is clearly going to be a performance issue
> it
> > seems like something we should do now.
>
> I've tried quite a bit to find a workload where this matters, but after
> avoiding redundant buffer accesses by sorting, and prefetching I was
> unable to do so. What workload do you see where this would be really be
> bad? Without the performance optimization I'd found a very minor
> regression by trying to force the heap visits to happen in a pretty
> random order, but after sorting that went away. I'm sure it's possible
> to find a case on overloaded rotational disks where you'd find a small
> regression, but I don't think it'd be particularly bad.
>
The code can do literally hundreds of random I/Os in an 8192 blocksize.
What happens with 16 or 32kB?
"Small regression" ?
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-03-29 16:00:58 | pgsql: Show table access methods as such in psql's \dA. |
Previous Message | Andres Freund | 2019-03-29 15:38:07 | pgsql: tableam: Comment fixes. |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2019-03-29 16:01:23 | Re: REINDEX CONCURRENTLY 2.0 |
Previous Message | Bossart, Nathan | 2019-03-29 15:53:05 | Re: REINDEX CONCURRENTLY 2.0 |