Quick Links

filter duplicates by priority

From:	"Clark Slater" <pg(at)slatech(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	filter duplicates by priority
Date:	2009-07-14 14:04:12
Message-ID:	52376.96.236.139.176.1247580252.squirrel@webmail8.pair.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hello-

I am trying to use DISTINCT ON to filter out *potential* duplicate values
from a set of sub queries. There are certain cases where there can be
repetitive part numbers that are priced differently. I'm trying to start
with the full list, ordered by priority, and then remove any repeats that
have a lesser priority.

SELECT DISTINCT ON (part_number) * FROM (
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
UNION ALL
SELECT part_number, priority FROM ...
) AS filter_duplicates ORDER BY priority,part_number

The above statement does not work because if I ORDER BY
priority,part_number then I have to DISTINCT ON (priority,part_number).
But DISTINCT ON (priority, part_number) does not remove the repeated rows
because the same part_number with a different priority becomes a distinct
tuple.

Any suggestions are appreciated.

--------------------------------------

A more detailed explanation of my problem follows:

I am working on an e-commerce system that has different lists of products
which contain many of the same products, at different prices. When a user
searches for a certain set of part numbers, I would like the resulting
products (and prices) to come from one of the lists, according to the
list's priority. Each user can have a different set of lists and
priorities.

Table: product_lists
id | name | priority | user_id
-----+------------------------------+----------+----------
5 | General List of Products | 2 | 23
3 | Different List of Products | 3 | 23
150 | Customer-Specific Products | 1 | 23

Table: products
product_list_id | part_number | price
-----------------+-------------+--------
3 | 92298A | 123.38
5 | 92298A | 111.04
3 | C39207 | 78.38
150 | C39207 | 67.93

Below is a simplified example of the structure of the query I am working
with. I realize that in this case, I could re-factor all of this into one
statement, but each sub-query in the real case has a more complex set of
joins that determines the price. The pricing joins from one sub-query to
the next vary, so a collection of sub-queries seemed to be a logical
solution. Some part numbers are found in only one of the lists, while
other part numbers are repeated across lists at different prices.

This is what I would *like* to say:

SELECT DISTINCT ON (part_number) * FROM (

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id=product_lists.id
AND product_list_id=150
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=5
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

UNION ALL

SELECT
product_list_id,part_number,price,priority
FROM products, product_lists
WHERE
product_list_id= product_lists.id
AND product_list_id=3
AND (part_number='92298A' OR part_number='C39207' OR part_number=...)

) AS filter_duplicates ORDER BY priority,part_number

I need to ORDER BY priority so that, in the case of duplicates, the
product from the desired list is returned first. Then the purpose of
DISTINCT ON is to filter out any duplicate part numbers that have a lesser
priority. But, the above statement fails because the DISTINCT ON
expression must match the leftmost ORDER BY expression. However,
inserting the priority into the DISTINCT ON expression means that all of
the resulting tuples are unique, even though the part_number is the same.

Responses

Re: filter duplicates by priority at 2009-07-14 15:30:38 from Hartman, Matthew
Re: filter duplicates by priority at 2009-07-14 15:48:56 from Tom Lane
Re: filter duplicates by priority at 2009-07-14 15:56:30 from Martijn van Oosterhout
Re: filter duplicates by priority at 2009-07-14 16:24:13 from Sam Mason

Browse pgsql-general by date

	From	Date	Subject
Next Message	Dan Armbrust	2009-07-14 14:06:19	Re: Checkpoint Tuning Question
Previous Message	Alvaro Herrera	2009-07-14 13:52:29	Re: Best practices for moving UTF8 databases