Re: UNIQUE constraint on KEY VALUE PAIRS

From: InterRob <rob(dot)marjot(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UNIQUE constraint on KEY VALUE PAIRS
Date: 2011-05-14 18:25:12
Message-ID: BANLkTikkMpBdKWsfR_=1WKpHqikR6rOtPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am afraid I over-simplyfied my use-case. Sorry.

In fact, I am not using a character sequence, but a custom COMPOSITE TYPE;
part of which is a hstore:
CREATE TYPE SSTORE AS (scope TEXT, kvp hstore);

I created some functions and operators around it -- sort-of costly
operations for comparing SSTOREs. Well: more costly than it would be to
compare INTEGERs, I guess.

The sets of Key/Values held by SSTORE will show extensive similarities, that
is: common pairs of key/value pairs.

So, essentially it looks like this:
CREATE TABLE textseqs(the_key SSTORE)

CREATE UNIQUE INDEX ON textseqs USING BTREE( hash_sstore(the_key), the_key )

Does this make sense? I am seeking some sensible trade-off between
performance and flexibility...

Regards,
Rob

2011/5/14 David Johnston <polobo(at)yahoo(dot)com>

> Could you index the reverse of the string so the unique part appears first?
>
> On May 14, 2011, at 11:20, InterRob <rob(dot)marjot(at)gmail(dot)com> wrote:
>
> > Dear list,
> >
> > I would be pleased if you could share some thoughts with me on the
> following: say I wish to maintain a table with all distinct character
> sequences (variable length) showing series with strong similarities.
> Example:
> > "abbbabacccdef"
> > "abbbabaccdcdf"
> > "abbbabaccdcgf"
> > ...
> > "qwtrhdffdd"
> > ...
> > "qwtrhdffdds"
> > ...
> > "qwtrhdffddsspp"
> > "qwtrhdffddsspf"
> > "qwtrhdffddssph"
> > "qwtrhdffddsspL"
> > etc.
> >
> > Think of them as ordered values (array-like), as a set having many values
> in common, in the same "elements" (that is: positions; in my application,
> each position has some particular meaning -- the sequence represents a set
> of particular settings)
> >
> > CREATE TABLE textseqs(txtseq TEXT)
> >
> > What would be an efficient approach in enforcing a UNIQUE constraint?
> >
> > I was thinking of using hashbuckets in a b-tree:
> > CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )
> >
> > This index would "cache" hashes for each row. Upon inserting of a new
> row, traversing the index involves the comparison of two single integers for
> each node in the b-tree, until the actual hash value (if it exists) was
> reached. Then, only within that bucket (the hashes won't be unique), the
> more expensive string comparing is required; involving a sequencial
> comparison of (potentially maaaany) characters. Yet, within that bucket, the
> character series may be expected to show stronger differences than a plain
> sorted list of all values in the table would have, indexed by a b-tree
> index. Wouldn't traversing such a plain (non-composite, single column, on:
> "txtseq") b-tree index involve a sequencial comparison of (potentially many)
> characters *at EACH NODE* of the tree ? Or am I mistaken that each node is
> filled with actual values from the txtseq column?
> >
> > Thank you for your input!
> >
> > Cheers,
> > Rob
>
>

Browse pgsql-general by date

  From Date Subject
Next Message Daron Ryan 2011-05-15 13:43:31 Inserting / Selecting / Getting the Index.
Previous Message David Johnston 2011-05-14 17:11:01 Re: UNIQUE constraint on character sequences