From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: BRIN multi-range indexes |
Date: | 2020-09-10 18:32:48 |
Message-ID: | 20200910183248.eajizgzuypntsjmh@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 09, 2020 at 10:26:00PM +0200, Tomas Vondra wrote:
>On Wed, Sep 09, 2020 at 04:53:30PM -0300, Alvaro Herrera wrote:
>>On 2020-Sep-09, Tomas Vondra wrote:
>>
>>>There are some minor optimizations possible - for example I noticed we
>>>call minmax_multi_get_strategy_procinfo often because it happens in a
>>>loop, and we could easily do it just once. But that saves only about 10%
>>>or so, it's not a ground-breaking optimization.
>>
>>Well, I guess this kind of thing should be fixed regardless while we
>>still know it's there, just to avoid an obvious inefficiency.
>>
>
>Sure. I was just suggesting it's not something that'd make this very
>close to plain minmax opclass.
>
>>>The main reason for the slowness is that we pass the values one by one
>>>to brin_minmax_multi_add_value - and on each call we need to deserialize
>>>(and then sometimes also serialize) the summary, which may be quite
>>>expensive. The regular minmax does not have this issue, it just swaps
>>>the Datum value and that's it.
>>
>>Ah, right, that's more interesting. The original dumb BRIN code
>>separates BrinMemTuple from BrinTuple so that things can be operated
>>efficiently in memory. Maybe something similar can be done in this
>>case, which also sounds like your second suggestion:
>>
>>>Another option would be to teach add_value to keep the deserialized
>>>summary somewhere, and then force serialization at the end of the BRIN
>>>page range. The end result would be roughly the same, I think.
>>
>
>Well, the patch already has Ranges (memory) and SerializedRanges (disk)
>but it's not very clear to me where to stash the in-memory data and
>where to make the conversion.
>
I've spent a bit of time experimenting with this. My idea was to allow
keeping an "expanded" version of the summary somewhere. As the addValue
function only receives BrinValues I guess one option would be to just
add bv_mem_values field. Or do you have a better idea?
Of course, more would need to be done:
1) We'd need to also pass the right memory context (bt_context seems
like the right thing, but that's not something addValue sees now).
2) We'd also need to specify some sort of callback that serializes the
in-memory value into bt_values. That's not something addValue can do,
because it doesn't know whether it's the last value in the range etc. I
guess one option would be to add yet another support proc, but I guess a
simple callback would be enough.
I've hacked together an experimental version of this to see how much
would it help, and it reduces the duration from ~4.6s to ~3.3s. Which is
nice, but plain minmax is ~1.1s. I suppose there's room for further
improvements in compare_combine_ranges/reduce_combine_ranges and so on,
but I still think there'll always be a gap compared to plain minmax.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Jonathan S. Katz | 2020-09-10 18:33:25 | Re: PG 13 release notes, first draft |
Previous Message | Robert Haas | 2020-09-10 18:26:59 | Re: SIGQUIT handling, redux |