Re: Custom opclass for column statistics?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ancoron Luciferis <ancoron(dot)luciferis(at)googlemail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Custom opclass for column statistics?
Date: 2019-07-06 15:57:53
Message-ID: 20190706155753.4yf5i4lezx42x5os@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, Jul 06, 2019 at 05:35:33PM +0200, Ancoron Luciferis wrote:
>On 06/07/2019 15:38, Tomas Vondra wrote:
>> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>>> Hi,
>>>
>>> I've been wondering whether it is possible somehow to have the standard
>>> column statistics to respect a certain operator class?
>>>
>>> The reason why I am asking for this is that I have a UUID column with a
>>> unique index at it using a custom operator class which implies a
>>> different sort order than for the default UUID operator class.
>>>
>>> This results into planner mistakes when determining whether to use the
>>> index for row selection or not. Too often it falls back into sequential
>>> scan due to this.
>>>
>>
>> Can you share an example demonstrating the issue?
>>
>>
>> regards
>>
>
>Yes, I have an opclass as follows:
>
>CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
> USING btree AS
> OPERATOR 1 <*,
> OPERATOR 1 <~ (uuid, timestamp with time zone),
> OPERATOR 2 <=*,
> OPERATOR 2 <=~ (uuid, timestamp with time zone),
> OPERATOR 3 =,
> OPERATOR 3 =~ (uuid, timestamp with time zone),
> OPERATOR 4 >=*,
> OPERATOR 4 >=~ (uuid, timestamp with time zone),
> OPERATOR 5 >*,
> OPERATOR 5 >~ (uuid, timestamp with time zone),
> FUNCTION 1 uuid_timestamp_cmp(uuid, uuid),
> FUNCTION 1 uuid_timestamp_only_cmp(uuid, timestamp
>with time zone),
> FUNCTION 2 uuid_timestamp_sortsupport(internal)
>;
>
>...and e.g. operator "<*" is defined as:
>
>CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
>RETURNS bool
>AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
>LANGUAGE C
>IMMUTABLE
>LEAKPROOF
>STRICT
>PARALLEL SAFE;
>
>COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';
>
>CREATE OPERATOR <* (
> LEFTARG = uuid,
> RIGHTARG = uuid,
> PROCEDURE = uuid_timestamp_lt,
> COMMUTATOR = '>*',
> NEGATOR = '>=*',
> RESTRICT = scalarltsel,
> JOIN = scalarltjoinsel
>);
>
>
>The function "uuid_timestamp_lt" is basically defined as follows:
>1. if not version 1 UUID fallback to standard uuid compare
>2. extract timestamp values and compare
>3. if equal timestamps fallback to standard uuid compare
>
>...so that a chronological order is established.
>
>
>The test table is created as follows:
>
>CREATE TABLE uuid_v1_ext (id uuid);
>CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops);
>
>
>The values for "histogram_bounds" of the test table look like this (due
>to the default sort order for standard type UUID):
>
>00003789-97bf-11e9-b6bb-e03f49f7f733
>008b88f8-6deb-11e9-901a-e03f4947f477
>010a8b22-586a-11e9-8258-e03f49ce78f3
>...
>6f682e68-978d-11e9-901a-e03f4947f477
>6ff412ee-926f-11e9-901a-e03f4947f477
>7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
>70ffaeca-4645-11e9-adf9-e03f494677fb
>...
>fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
>ff779ce8-9e52-11e9-8258-e03f49ce78f3
>ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf
>
>...and I think that's where the planner gets the decision for a query
>such as:
>
>DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';
>
>...which then get's executed as sequential scan instead of an index scan.
>
>I was also thinking about changing the selectivity function used by the
>custom operator, but I didn't find any hints how to implement that
>without duplicating a lot of internal code.
>

Not sure, I'm not very familiar with this code, so I'd have to play with
it and try things. But that's hard when I don't have any code. Would it
be possible to share a small self-contained test case?

I wonder what does uuid_timestamp_cmp do? I suppose it first compares by
a timestamp extracted from the UUID, right?

It'd be interesting to see

(a) statistics for the column from pg_stats, both for the table and
index (which should have been built using the custom opclass, I think).

(b) EXPLAIN ANALYZE for queries with your opclass, and perhaps with the
default one (that can't use the timestamp condition, but it should be
possible to generate smallers/largest uuid for a timestamp).

BTW which PostgreSQL version is this?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ancoron Luciferis 2019-07-06 23:17:09 Re: Custom opclass for column statistics?
Previous Message Ancoron Luciferis 2019-07-06 15:35:33 Re: Custom opclass for column statistics?