Quick Links

Re: Removing redundant itemsets

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Allan Kamau <allank(at)sanbi(dot)ac(dot)za>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: Removing redundant itemsets
Date:	2008-03-31 10:53:28
Message-ID:	47F0C2A8.9010008@postnewspapers.com.au
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Allan Kamau wrote:
> Hi all,
> I have a list of purchases (market basket) and I would like to select
> non redundant longest possible patterns by eliminating
> (creating/populating other table to contain only non redandant itemsets)
> purchases having item lists which are fully included in at least one
> other purchase.

Here's a possibly slow and surely ugly solution (I think it's right,
though I haven't done more than passing testing):

CREATE VIEW togo_as_arr AS
SELECT a.tid,
ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
AS items
FROM togo a GROUP BY tid;

SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;

(the view isn't necessary, but does improve the readability of the query).

It groups the purchases up with item lists as arrays, then finds any
purchases with items arrays wholly contained by other item arrays from
other purchases.

I'm *sure* there's a smarter way to do this that avoids the use of
arrays, but I don't seem to be able to come up with one right now. It's
interesting, though, so I might keep fiddling.

--
Craig Ringer

In response to

Removing redundant itemsets at 2008-03-31 08:16:17 from Allan Kamau

Responses

Re: Removing redundant itemsets at 2008-03-31 09:58:50 from Allan Kamau
Re: Removing redundant itemsets at 2008-03-31 11:29:22 from Craig Ringer

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Craig Ringer	2008-03-31 11:29:22	Re: Removing redundant itemsets
Previous Message	Allan Kamau	2008-03-31 09:58:50	Re: Removing redundant itemsets