From: | Andreas Karlsson <andreas(at)proxel(dot)se> |
---|---|
To: | emre(at)hasegeli(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: GiST support for inet datatypes |
Date: | 2014-01-19 23:17:04 |
Message-ID: | 52DC5CF0.6070908@proxel.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here comes part 2 of 2 of the review.
inet-gist
---------
In general the code looks good and I think your approach makes sense.
Not an expert on GiST though so I would like a second opinion on this.
Maybe there is some clever trick which would make the index more
efficient, but the design I see is simple and logical. Like this much
more than the incorrect GiST index for inet in btree_gist.
The only thing is that I think your index would do poorly on the case
where all values share a prefix before the netmask but have different
values after the netmask, since gist_union and gist_penalty does not
care about the bits after the netmask. Am I correct? Have you thought
anything about this?
To create such data:
CREATE TABLE a (ip inet);
INSERT INTO a SELECT '127.0.0.0/16'::inet + s FROM generate_series(1,
pow(2, 16)::int) s;
CREATE INDEX aidx ON a USING gist (ip);
Saw also some minor things too.
Typo in comment, weird sentence:
* The main question to calculate the union is that they have how many
* bits in common. [...]
I do not like how you called the result key i inet_union_gist() "tmp".
Something like "result" or "result_ip" would be better.
Typo in comment, "reverse" should be "inverse":
* Penalty is reverse of the common IP bits of the two addresses.
Typo in comment, missing "be":
* Values bigger than 1 are used when the common IP bits cannot
* calculated.
Language can be improved in this comment. Both ways to split are by the
union of the keys, it is more of a split by address prefix.
* The second and the important way is to split by the union of the keys.
* Union of the keys calculated same way with the inet_gist_union function.
* The first and the last biggest subnets created from the calculated
* union. In this case addresses contained by the first subnet will be put
* to the left bucket, address contained by the last subnet will be put to
* the right bucket.
Could you not just use memcmp in inet_gist_same() instead of bitncmp()
since it only cares about equality?
There is an extra empty line at the end of network_gist.c
inet-selfuncs
-------------
Here I have to honestly admit I am not sure if I can tell if your
selectivity function is correct. Your explanation sounds convincing and
the general idea of looking at the histogram is correct. I am just have
no idea if the details are right.
DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.
Typo in comment, "constrant" -> "constant".
There is an extra empty line at the end of network_selfuncs.c
--
Andreas Karlsson
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2014-01-19 23:36:51 | Re: [PATCH] Fix double-inclusion of pg_config_os.h when building extensions with Visual Studio |
Previous Message | MauMau | 2014-01-19 22:35:10 | Re: [bug fix] pg_ctl always uses the same event source |