Re: Tricky SQL - assistance appreicated. DDL and DML supplied.

From: Matt Clement <mattpc9(at)gmail(dot)com>
To: Pál Teleki <ellenallhatatlan(at)gmail(dot)com>
Cc: pgsql-novice novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Tricky SQL - assistance appreicated. DDL and DML supplied.
Date: 2016-07-03 02:29:37
Message-ID: CAEhQjh5w-sGwr2sddDuDmeox+nr71P5QKEWDc1C1ef_BRsF2mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hey there,

I think you were pretty close!

Let's start with the query that you came up with:

SELECT t1.c_id AS theid1, t1.guid AS theguid1, MAX(t1.ts) AS mydate1 FROM
ex t1
INNER JOIN ex t2
ON t1.c_id = t2.c_id AND t1.guid = t2.guid
GROUP BY t1.c_id, t1.guid
ORDER BY t1.guid, MAX(t1.ts)

This will get the c_id, guid, and ts of the rows you want to keep. I'll
clean this up a bit, as the join isn't really useful at this point (none of
the selected columns are from the joined table).

SELECT c_id, guid, MAX(ts) AS mydate1 FROM ex
GROUP BY c_id, guid
ORDER BY guid, MAX(ts);

We can use this as a subquery to find the ex_ids that we want to keep and
bring back the same join conditions that you had before:

SELECT ex_id from ex t1
INNER JOIN (
SELECT c_id, guid, MAX(ts) AS mydate1 FROM ex
GROUP BY c_id, guid
ORDER BY guid, MAX(ts);
) t2 ON t1.c_id = t2.c_id AND t1.guid = t2.guid AND t1.ts = t2.mydate1

Now we have all of the ex_id's that we want to keep. We can use a simple
NOT IN condition to get all of the rows that should be deleted:

SELECT * from ex
WHERE ex_id NOT IN (
SELECT ex_id from ex t1
INNER JOIN (
SELECT c_id, guid, MAX(ts) AS mydate1 FROM ex
GROUP BY c_id, guid
ORDER BY guid, MAX(ts);
) t2 ON t1.c_id = t2.c_id AND t1.guid = t2.guid AND t1.ts = t2.mydate1
)

If you actually want to delete those rows you could do a DELETE FROM
instead of SELECT FROM. Here's the query that I actually ran and the
resulting rows:

matt=# DELETE from ex where ex_id not in (SELECT t1.ex_id FROM ex t1 join
> (SELECT c_id, guid, max(ts) mts from ex group by c_id, guid) as t2 on
> t1.c_id = t2.c_id AND t1.guid = t2.guid AND t1.ts = t2.mts);
> DELETE 4
> matt=# SELECT * from ex;
> ex_id | c_id | guid | supplier | ts
> -------+------+------+----------+---------------------
> 3 | 1 | xxxx | 50 | 2016-07-15 22:05:01
> 4 | 1 | xxxx | 50 | 2016-07-15 22:05:01
> 7 | 2 | yyyy | 71 | 2016-07-17 22:05:01
> 8 | 2 | yyyy | 74 | 2016-07-17 22:05:01
> 9 | 3 | zzzz | 60 | 2016-07-01 22:05:01
> 10 | 4 | aaaa | 61 | 2016-07-01 22:05:01
> (6 rows)
>

I believe this solution should work with mysql as well, although I've only
done some light testing.
Hope that helps!

On Sat, Jul 2, 2016 at 8:05 PM, Pál Teleki <ellenallhatatlan(at)gmail(dot)com>
wrote:

>
>
>
> Hi all,
>
> I have an SQL problem and I'm stumped.
>
>
> I have the following data (see DDL and DML at end of post)
>
> "ex_id"; "c_id"; "guid"; "supplier"; "ts"
> -------- ------- -------- ----------- ---------------------
> 42; 4; "aaaa"; 61; "2016-07-01 22:05:01"
> 34; 1; "xxxx"; 50; "2016-07-01 22:05:01"
> 33; 1; "xxxx"; 54; "2016-07-01 22:05:01"
> 35; 1; "xxxx"; 50; "2016-07-15 22:05:01"
> 36; 1; "xxxx"; 50; "2016-07-15 22:05:01"
> 37; 2; "yyyy"; 78; "2016-07-01 22:05:01"
> 38; 2; "yyyy"; 79; "2016-07-01 22:05:01"
> 39; 2; "yyyy"; 71; "2016-07-17 22:05:01"
> 40; 2; "yyyy"; 74; "2016-07-17 22:05:01"
> 41; 3; "zzzz"; 60; "2016-07-01 22:05:01"
>
>
> and the resulting table - i.e. the records I want to retain in the table
> are:
>
>
> "ex_id"; "c_id"; "guid"; "supplier"; "ts"
> -------- ------- -------- ----------- ---------------------
> 42; 4; "aaaa"; 61; "2016-07-01 22:05:01"
> 35; 1; "xxxx"; 50; "2016-07-15 22:05:01"
> 36; 1; "xxxx"; 50; "2016-07-15 22:05:01"
> 39; 2; "yyyy"; 71; "2016-07-17 22:05:01"
> 40; 2; "yyyy"; 74; "2016-07-17 22:05:01"
> 41; 3; "zzzz"; 60; "2016-07-01 22:05:01"
>
>
>
> i.e. Take the combination of c_id, guid and date which is the last, no
> matter
> how many records there are with a given combination of same. The (guid)
> 'aaaa'
> and the 'zzzz' records obviously stay since there's only 1 of each.
>
> But the records with 'xxxx' and 'yyyy' have to be "pruned" - only keep
> those
> with the latest date where c_id matches and guid matches.
>
> 1) This query will also have to work with MySQL, so CTEs and Analytic
> functions
> are out - good old fashioned SQL only.
>
> 2) Ideally, I would like two queries, one under the assumption that ex_id
> is
> sequential with time and the other that ex_id is not necessarily in sync
> with
> the timestamp.
>
> 3) This appears to be a "top-n-per-group" problem of some sort - I would
> appreciate an explanation of the query, not just the raw query itself -
> I'm
> trying to learn rather than just a bald answer. References, URLs &c
> appreciated.
>
> Just to show that I'm not totally lazy :-), I have formulated what I think
> is
> the first step, but I'm stuck. It would be easy enough to do this with
> CTEs but,
> as I've said, the query also has to work with MySQL. If you want to throw
> in
> solutions based on CTEs and/or SET operators and/or Analytic functions,
> for pedagological purposes, great, but I do need a MySQL solution.
>
>
>
> SELECT t1.c_id AS theid1, t1.guid AS theguid1, MAX(t1.ts) AS mydate1 FROM
> ex t1
> INNER JOIN ex t2
> ON t1.c_id = t2.c_id AND t1.guid = t2.guid
> GROUP BY t1.c_id, t1.guid
> ORDER BY t1.guid, MAX(t1.ts)
>
> "theid1"; "theguid1"; "mydate1"
> --------- ----------- ---------------------
> 4; "aaaa"; "2016-07-01 22:05:01"
> 1; "xxxx"; "2016-07-15 22:05:01"
> 2; "yyyy"; "2016-07-17 22:05:01"
> 3; "zzzz"; "2016-07-01 22:05:01"
>
>
>
> Now, these are the records that **shouldn't** be deleted - without
> the supplier data - which also has to be retained - i.e. I need to
> retain more records than this - but this should be a start. The data
> are correct for guid, datetime and c_id.
>
>
> I just can't figure out how to formulate the statement to delete the
> other records.
>
> Maybe with SET operators, if MySQL had those?
>
> Anyway, all and any input appreciated. If any further info. is required,
> please
> let me know.
>
>
> ============= DDL and DML for the table ===============
>
>
> CREATE TABLE ex
> (
> ex_id serial,
> c_id int,
> guid varchar(32),
> supplier int,
> ts timestamp,
> PRIMARY KEY (ex_id)
> );
>
>
>
> insert into ex (c_id, guid, supplier, ts) VALUES(1, 'xxxx', 54,
> '2016-07-01 22:05:01');
> insert into ex (c_id, guid, supplier, ts) VALUES(1, 'xxxx', 50,
> '2016-07-01 22:05:01');
> insert into ex (c_id, guid, supplier, ts) VALUES(1, 'xxxx', 50,
> '2016-07-15 22:05:01');
> insert into ex (c_id, guid, supplier, ts) VALUES(1, 'xxxx', 50,
> '2016-07-15 22:05:01');
>
>
> insert into ex (c_id, guid, supplier, ts) VALUES(2, 'yyyy', 78,
> '2016-07-01 22:05:01');
> insert into ex (c_id, guid, supplier, ts) VALUES(2, 'yyyy', 79,
> '2016-07-01 22:05:01');
> insert into ex (c_id, guid, supplier, ts) VALUES(2, 'yyyy', 71,
> '2016-07-17 22:05:01');
> insert into ex (c_id, guid, supplier, ts) VALUES(2, 'yyyy', 74,
> '2016-07-17 22:05:01');
>
> insert into ex (c_id, guid, supplier, ts) VALUES(3, 'zzzz', 60,
> '2016-07-01 22:05:01');
> insert into ex (c_id, guid, supplier, ts) VALUES(4, 'aaaa', 61,
> '2016-07-01 22:05:01');
>
>
>
> --
>
> Pál Teleki
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2016-07-03 15:18:03 Re: Tricky SQL - assistance appreicated. DDL and DML supplied.
Previous Message Pál Teleki 2016-07-03 00:05:52 Tricky SQL - assistance appreicated. DDL and DML supplied.