Re: Custom opclass for column statistics?

From: Ancoron Luciferis <ancoron(dot)luciferis(at)googlemail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Custom opclass for column statistics?
Date: 2019-07-06 15:35:33
Message-ID: 8ef092e6-d0c1-4f7a-86ae-e708e1322337@googlemail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 06/07/2019 15:38, Tomas Vondra wrote:
> On Sat, Jul 06, 2019 at 11:02:27AM +0200, Ancoron Luciferis wrote:
>> Hi,
>>
>> I've been wondering whether it is possible somehow to have the standard
>> column statistics to respect a certain operator class?
>>
>> The reason why I am asking for this is that I have a UUID column with a
>> unique index at it using a custom operator class which implies a
>> different sort order than for the default UUID operator class.
>>
>> This results into planner mistakes when determining whether to use the
>> index for row selection or not. Too often it falls back into sequential
>> scan due to this.
>>
>
> Can you share an example demonstrating the issue?
>
>
> regards
>

Yes, I have an opclass as follows:

CREATE OPERATOR CLASS uuid_timestamp_ops FOR TYPE uuid
USING btree AS
OPERATOR 1 <*,
OPERATOR 1 <~ (uuid, timestamp with time zone),
OPERATOR 2 <=*,
OPERATOR 2 <=~ (uuid, timestamp with time zone),
OPERATOR 3 =,
OPERATOR 3 =~ (uuid, timestamp with time zone),
OPERATOR 4 >=*,
OPERATOR 4 >=~ (uuid, timestamp with time zone),
OPERATOR 5 >*,
OPERATOR 5 >~ (uuid, timestamp with time zone),
FUNCTION 1 uuid_timestamp_cmp(uuid, uuid),
FUNCTION 1 uuid_timestamp_only_cmp(uuid, timestamp
with time zone),
FUNCTION 2 uuid_timestamp_sortsupport(internal)
;

...and e.g. operator "<*" is defined as:

CREATE FUNCTION uuid_timestamp_lt(uuid, uuid)
RETURNS bool
AS 'MODULE_PATHNAME', 'uuid_timestamp_lt'
LANGUAGE C
IMMUTABLE
LEAKPROOF
STRICT
PARALLEL SAFE;

COMMENT ON FUNCTION uuid_timestamp_lt(uuid, uuid) IS 'lower than';

CREATE OPERATOR <* (
LEFTARG = uuid,
RIGHTARG = uuid,
PROCEDURE = uuid_timestamp_lt,
COMMUTATOR = '>*',
NEGATOR = '>=*',
RESTRICT = scalarltsel,
JOIN = scalarltjoinsel
);

The function "uuid_timestamp_lt" is basically defined as follows:
1. if not version 1 UUID fallback to standard uuid compare
2. extract timestamp values and compare
3. if equal timestamps fallback to standard uuid compare

...so that a chronological order is established.

The test table is created as follows:

CREATE TABLE uuid_v1_ext (id uuid);
CREATE UNIQUE INDEX idx_uuid_v1_ext ON uuid_v1_ext (id uuid_timestamp_ops);

The values for "histogram_bounds" of the test table look like this (due
to the default sort order for standard type UUID):

00003789-97bf-11e9-b6bb-e03f49f7f733
008b88f8-6deb-11e9-901a-e03f4947f477
010a8b22-586a-11e9-8258-e03f49ce78f3
...
6f682e68-978d-11e9-901a-e03f4947f477
6ff412ee-926f-11e9-901a-e03f4947f477
7079ffe2-642f-11e9-b0cc-e03f49e7fd3b
70ffaeca-4645-11e9-adf9-e03f494677fb
...
fef26b41-9b9d-11e9-b0cc-e03f49e7fd3b
ff779ce8-9e52-11e9-8258-e03f49ce78f3
ffff6bfc-4de4-11e9-b0d4-e03f49d6f6bf

...and I think that's where the planner gets the decision for a query
such as:

DELETE FROM uuid_v1_ext WHERE id <* '4bdf6f81-56ad-11e9-8258-e03f49ce78f3';

...which then get's executed as sequential scan instead of an index scan.

I was also thinking about changing the selectivity function used by the
custom operator, but I didn't find any hints how to implement that
without duplicating a lot of internal code.

Cheers,

Ancoron

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2019-07-06 15:57:53 Re: Custom opclass for column statistics?
Previous Message Tom Lane 2019-07-06 13:58:58 Re: Custom opclass for column statistics?