From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org> |
Subject: | Re: range_agg |
Date: | 2020-12-08 00:20:10 |
Message-ID: | CAPpHfdtRY10Acg4LNCnj6uu0RAF6QZWzmPHd8qcc1aorub-1AQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 8, 2020 at 3:00 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2020-Dec-08, Alexander Korotkov wrote:
>
> > I also found a problem in multirange types naming logic. Consider the
> > following example.
> >
> > create type a_multirange AS (x float, y float);
> > create type a as range(subtype=text, collation="C");
> > create table tbl (x __a_multirange);
> > drop type a_multirange;
> >
> > If you dump this database, the dump couldn't be restored. The
> > multirange type is named __a_multirange, because the type named
> > a_multirange already exists. However, it might appear that
> > a_multirange type is already deleted. When the dump is restored, a
> > multirange type is named a_multirange, and the corresponding table
> > fails to be created. The same thing doesn't happen with arrays,
> > because arrays are not referenced in dumps by their internal names.
> >
> > I think we probably should add an option to specify multirange type
> > names while creating a range type. Then dump can contain exact type
> > names used in the database, and restore wouldn't have a names
> > collision.
>
> Hmm, good point. I agree that a dump must preserve the name, since once
> created it is user-visible. I had not noticed this problem, but it's
> obvious in retrospect.
>
> > In general, I wonder if we can make the binary format of multiranges
> > more efficient. It seems that every function involving multiranges
> > from multirange_deserialize(). I think we can make functions like
> > multirange_contains_elem() much more efficient. Multirange is
> > basically an array of ranges. So we can pack it as follows.
> > 1. Typeid and rangecount
> > 2. Tightly packed array of flags (1-byte for each range)
> > 3. Array of indexes of boundaries (4-byte for each range). Or even
> > better we can combine offsets and lengths to be compression-friendly
> > like jsonb JEntry's do.
> > 4. Boundary values
> > Using this format, we can implement multirange_contains_elem(),
> > multirange_contains_range() without deserialization and using binary
> > search. That would be much more efficient. What do you think?
>
> I also agree. I spent some time staring at the I/O code a couple of
> months back but was unable to focus on it for long enough. I don't know
> JEntry's format, but I do remember that the storage format for JSONB was
> widely discussed back then; it seems wise to apply similar logic or at
> least similar reasoning.
Thank you for your feedback!
I'd like to publish my revision of the patch. So Paul could start
from it. The changes I made are minor
1. Add missing types to typedefs.list
2. Run pg_indent run over the changed files and some other formatting changes
3. Reorder the regression tests to evade the error spotted by
commitfest.cputube.org
I'm switching this patch to WOA.
------
Regards,
Alexander Korotkov
Attachment | Content-Type | Size |
---|---|---|
v25-multirange.patch | application/octet-stream | 430.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-12-08 00:45:53 | Re: [Patch] Optimize dropping of relation buffers using dlist |
Previous Message | Greg Nancarrow | 2020-12-08 00:17:33 | Re: On login trigger: take three |