From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Write Ahead Logging for Hash Indexes |
Date: | 2017-03-16 04:09:37 |
Message-ID: | CAA4eK1+VsOVPqgST-gguC4xV4yfV-AdFfV9HsFoxxp2tyUbL5g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 15, 2017 at 7:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Mar 15, 2017 at 9:18 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>>> I think we have sufficient comments in code especially on top of
>>> function _hash_alloc_buckets().
>>
>> I don't see any comments regarding how we have to be sure to handle
>> an out-of-space case properly in the middle of a file because we've made
>> it sparse.
>>
>> I do see that mdwrite() should handle an out-of-disk-space case, though
>> that just makes me wonder what's different here compared to normal
>> relations that we don't have an issue with a sparse WAL'd hash index but
>> we can't handle it if a normal relation is sparse.
>
> I agree. I think that what hash indexes are doing here is
> inconsistent with what we do for btree indexes and the heap. And I
> don't think it would be bad to fix that. We could, of course, go the
> other way and do what Tom is suggesting - insist that everybody's got
> to be prepared for sparse files, but I would view that as something of
> a U-turn. I think hash indexes are like this because nobody's really
> worried about hash indexes because they haven't been crash-safe or
> performant. Now that we've made them crash safe and are on the way to
> making them performant, fixing other things is, as Tom already said, a
> lot more interesting.
>
> Now, that having been said, I'm not sure it's a good idea to tinker
> with the behavior for v10. We could change the new-splitpoint code so
> that it loops over all the pages in the new splitpoint and zeroes them
> all, instead of just the last one.
>
Yeah, but I am slightly afraid that apart from consuming too much
space, it will make the operation lot slower when we have to perform
the allocation step. Another possibility is to do that when we
actually use/allocate the bucket? As of now, _hash_getnewbuf assumes
that the block is pre-existing and avoid the actual read/extend by
using RBM_ZERO_AND_LOCK mode. I think we can change it so that it
forces a new allocation (if the block doesn't exist) on need. I agree
this is somewhat more change than what you have proposed to circumvent
the sparse file behavior, but may not be a ton more work if we decide
to fix and has the advantage of allocating the space on disk on actual
need.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Haribabu Kommi | 2017-03-16 04:29:17 | Re: Proposal: GetOldestXminExtend for ignoring arbitrary vacuum flags |
Previous Message | Andreas Karlsson | 2017-03-16 03:47:05 | Re: WIP: Faster Expression Processing v4 |