From: | InterRob <rob(dot)marjot(at)gmail(dot)com> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | UNIQUE constraint on character sequences |
Date: | 2011-05-14 15:20:01 |
Message-ID: | BANLkTikmZJ0-P6f5ax-b_4mzCLmDSS36Pg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Dear list,
I would be pleased if you could share some thoughts with me on the
following: say I wish to maintain a table with all distinct character
sequences (variable length) showing series with strong similarities.
Example:
"abbbabacccdef"
"abbbabaccdcdf"
"abbbabaccdcgf"
...
"qwtrhdffdd"
...
"qwtrhdffdds"
...
"qwtrhdffddsspp"
"qwtrhdffddsspf"
"qwtrhdffddssph"
"qwtrhdffddsspL"
etc.
Think of them as ordered values (array-like), as a set having many values in
common, in the same "elements" (that is: positions; in my application, each
position has some particular meaning -- the sequence represents a set of
particular settings)
CREATE TABLE textseqs(txtseq TEXT)
What would be an efficient approach in enforcing a UNIQUE constraint?
I was thinking of using hashbuckets in a b-tree:
CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )
This index would "cache" hashes for each row. Upon inserting of a new row,
traversing the index involves the comparison of two single integers for each
node in the b-tree, until the actual hash value (if it exists) was reached.
Then, only within that bucket (the hashes won't be unique), the more
expensive string comparing is required; involving a sequencial comparison of
(potentially maaaany) characters. Yet, within that bucket, the character
series may be expected to show stronger differences than a plain sorted list
of all values in the table would have, indexed by a b-tree index. Wouldn't
traversing such a plain (non-composite, single column, on: "txtseq") b-tree
index involve a sequencial comparison of (potentially many) characters *at
EACH NODE* of the tree ? Or am I mistaken that each node is filled with
actual values from the txtseq column?
Thank you for your input!
Cheers,
Rob
From | Date | Subject | |
---|---|---|---|
Next Message | David W Noon | 2011-05-14 15:33:23 | Re: Duplicated Fields |
Previous Message | Daron Ryan | 2011-05-14 14:35:13 | Duplicated Fields |