Re: Using contains operator on arrays that have duplicate elements: feedback request.

From: Seref Arikan <serefarikan(at)kurumsalteknoloji(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Using contains operator on arrays that have duplicate elements: feedback request.
Date: 2012-08-08 15:04:28
Message-ID: CA+4ThdqD-wV8NNTaQeAymUJbsJjQY_2BqoC3eLhhuRw5NccdWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks a lot David,
You are right, I want to left to be a superset (or equal at the limit) of
the right set, but taking cardinalities of elements into account. Please do
not spend time on this, since it appears I can't keep the indexing
advantage of array operation under these circumstances.

Actually, the sets are created via concatenating ids that represent node
types in a tree. I wanted to develop a solution using arrays, but maybe I
should look into ltree, which may help me do what I want to do without
losing the advantage of indexing.

Or I can implement some other logic in the business layer that ensures that
duplicate elements are eliminated before inserts and queries with a
replacement strategy that changes second and further occurrences of node
ids on both left and right side of the array operation. This may help me
keep array based approach. I am not sure how ltree would scale in terms of
performance to a few billion rows.

Best regards
Seref

On Wed, Aug 8, 2012 at 3:45 PM, David Johnston <polobo(at)yahoo(dot)com> wrote:

> From: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Seref Arikan
> Sent: Wednesday, August 08, 2012 10:35 AM
> To: PG-General Mailing List
> Subject: [GENERAL] Using contains operator on arrays that have duplicate
> elements: feedback request.
>
> Greetings,
> I've been trying to use contains operator on an array such as {3,4,9,4,5}
> My problem is, I'd like to get rows that has two 4s in them, and contains
> operator seems to work separately for each member of the right operand.
> Therefore
>
> select '{3,4,9,4,5}'::int[] @> '{4,4,4}'
>
> returns true. I'm trying to implement the semantics of the operation so
> that
> the set {4,4,4} is a single parameter to a subset query, which should
> return
> false in the case above.
>
> Is there any way I can force the contains operation to consider cardinality
> of unique members of the set ? Arrays look like to simplest way of
> performing set operations on a group of values, but is there anything else
> that can be an alternative to them?
>
> Best regards
> Seref
>
> ============================================================
>
> The only thing I can think of is to create a function that unnest(s) the
> two
> supplied arrays, groups on the value and performs a count, then LEFT JOINs
> the two grouped relations - limited by a count comparison - and returns
> TRUE/FALSE depending on whether there are any NULLs on the right side of
> the
> join.
>
> I may try a stab and an example of this later if it is still an open
> question but hopefully this will help you.
>
> I do not believe you are going to be able to change the behavior of
> "contains" in the way that you want.
>
> You might also try performing an intersection and then confirm that the
> result is the same as the compare-to array; it seems that you want the
> left-side to be a super-set of the right-side in the example provided.
>
> Apologies if I get some of the left/right/compare-to terminology mixed-up.
> I hope that at least the intent is clear.
>
> HTH
>
> David J.
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Kevin Grittner 2012-08-08 16:45:40 Re: File system level backup
Previous Message David Johnston 2012-08-08 14:45:00 Re: Using contains operator on arrays that have duplicate elements: feedback request.