Re: GiST support for inet datatypes

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-19 23:17:04
Message-ID: 52DC5CF0.6070908@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here comes part 2 of 2 of the review.

inet-gist
---------

In general the code looks good and I think your approach makes sense.
Not an expert on GiST though so I would like a second opinion on this.
Maybe there is some clever trick which would make the index more
efficient, but the design I see is simple and logical. Like this much
more than the incorrect GiST index for inet in btree_gist.

The only thing is that I think your index would do poorly on the case
where all values share a prefix before the netmask but have different
values after the netmask, since gist_union and gist_penalty does not
care about the bits after the netmask. Am I correct? Have you thought
anything about this?

To create such data:

CREATE TABLE a (ip inet);
INSERT INTO a SELECT '127.0.0.0/16'::inet + s FROM generate_series(1,
pow(2, 16)::int) s;
CREATE INDEX aidx ON a USING gist (ip);

Saw also some minor things too.

Typo in comment, weird sentence:

* The main question to calculate the union is that they have how many
* bits in common. [...]

I do not like how you called the result key i inet_union_gist() "tmp".
Something like "result" or "result_ip" would be better.

Typo in comment, "reverse" should be "inverse":

* Penalty is reverse of the common IP bits of the two addresses.

Typo in comment, missing "be":

* Values bigger than 1 are used when the common IP bits cannot
* calculated.

Language can be improved in this comment. Both ways to split are by the
union of the keys, it is more of a split by address prefix.

* The second and the important way is to split by the union of the keys.
* Union of the keys calculated same way with the inet_gist_union function.
* The first and the last biggest subnets created from the calculated
* union. In this case addresses contained by the first subnet will be put
* to the left bucket, address contained by the last subnet will be put to
* the right bucket.

Could you not just use memcmp in inet_gist_same() instead of bitncmp()
since it only cares about equality?

There is an extra empty line at the end of network_gist.c

inet-selfuncs
-------------

Here I have to honestly admit I am not sure if I can tell if your
selectivity function is correct. Your explanation sounds convincing and
the general idea of looking at the histogram is correct. I am just have
no idea if the details are right.

DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.

Typo in comment, "constrant" -> "constant".

There is an extra empty line at the end of network_selfuncs.c

--
Andreas Karlsson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2014-01-19 23:36:51 Re: [PATCH] Fix double-inclusion of pg_config_os.h when building extensions with Visual Studio
Previous Message MauMau 2014-01-19 22:35:10 Re: [bug fix] pg_ctl always uses the same event source