From: | Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Retrieving multiple columns from a subquery |
Date: | 2012-05-15 02:40:12 |
Message-ID: | CAK7KUdA_tm4YK89f5EMc5G_XAhP80BV8=h8RUc6Ocafn44mbqg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Nothing? Are subqueries just not meant to be used this way?
On Wed, May 9, 2012 at 9:42 AM, Chris Hanks
<christopher(dot)m(dot)hanks(at)gmail(dot)com>wrote:
> Hello -
>
> I have two tables:
>
> CREATE TABLE users
> (
> id serial NOT NULL,
> created_at timestamp with time zone NOT NULL,
> last_seen_at timestamp with time zone NOT NULL,
> -- some other columns...
> )
>
> CREATE TABLE emails
> (
> user_id integer NOT NULL,
> address text NOT NULL,
> created_at timestamp with time zone NOT NULL,
> confirmed_at timestamp with time zone,
> -- some other columns...
> CONSTRAINT emails_user_id_fkey FOREIGN KEY (user_id)
> REFERENCES users (id) MATCH SIMPLE
> ON UPDATE CASCADE ON DELETE CASCADE
> )
>
> The gist is that a single user can be related to multiple emails, and some
> email addresses are confirmed (they've clicked a link I've sent there, so I
> know it's valid) and some aren't.
>
> Routinely, when I'm fetching users from the db I also want to get the best
> email address for each user. That is, the email address that they've
> confirmed the most recently, or failing that, the one that they created the
> most recently. I've been doing this via a subselect:
>
> SELECT *,
> (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY
> "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email"
> FROM "users"
>
> I like the subquery approach because I can use my ORM to easily append it
> to whatever query I'm running against the users table (whether I'm looking
> up one user or many), without having to do an explicit join and trim out
> the unnecessary rows. Also, in the future I'm planning on adding additional
> subqueries to get (for example) each user's current subscription status,
> and I'm afraid that the joins will get ungainly. Besides, I find subqueries
> much easier to reason about than joins.
>
> My problem is that now I need to get not only the best email's address,
> but whether it is confirmed (whether confirmed_at is not null). My first
> attempt was to simply repeat the subquery:
>
> SELECT *,
> (SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY
> "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email",
> ((SELECT "confirmed_at" FROM "emails" WHERE ("user_id" = "id") ORDER BY
> "confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) IS NOT NULL) AS
> "best_email_confirmed"
> FROM "users"
>
> I had hoped Postgres would recognize that the two subqueries were
> identical, but judging from the explain output from my development database
> it's not, and it's simply running the subquery twice instead:
>
> "Seq Scan on users (cost=0.00..333.65 rows=13 width=81)"
> " SubPlan 1"
> " -> Limit (cost=12.79..12.79 rows=1 width=48)"
> " -> Sort (cost=12.79..12.80 rows=5 width=48)"
> " Sort Key: public.emails.confirmed_at,
> public.emails.created_at"
> " -> Bitmap Heap Scan on emails (cost=4.29..12.76 rows=5
> width=48)"
> " Recheck Cond: (user_id = users.id)"
> " -> Bitmap Index Scan on emails_pkey
> (cost=0.00..4.29 rows=5 width=0)"
> " Index Cond: (user_id = users.id)"
> " SubPlan 2"
> " -> Limit (cost=12.79..12.79 rows=1 width=16)"
> " -> Sort (cost=12.79..12.80 rows=5 width=16)"
> " Sort Key: public.emails.confirmed_at,
> public.emails.created_at"
> " -> Bitmap Heap Scan on emails (cost=4.29..12.76 rows=5
> width=16)"
> " Recheck Cond: (user_id = users.id)"
> " -> Bitmap Index Scan on emails_pkey
> (cost=0.00..4.29 rows=5 width=0)"
> " Index Cond: (user_id = users.id)"
>
> It would be ideal if I could pull both results from the same subquery,
> something like:
>
> SELECT *,
> (SELECT "address", "confirmed_at" IS NOT NULL FROM "emails" WHERE
> ("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at"
> DESC LIMIT 1) AS ("best_email", "best_email_confirmed")
> FROM "users"
>
> But this isn't valid syntax. I tried putting the subquery under a FROM
> clause, but it won't work with my "user_id" = "id" condition, and throws
> "ERROR: subquery in FROM cannot refer to other relations of same query
> level". I think CTEs might be an answer, but I'm stuck on 8.3 for the
> foreseeable future, which doesn't support them.
>
> Does anyone have any suggestions?
>
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2012-05-15 03:30:07 | Re: Retrieving multiple columns from a subquery |
Previous Message | Basil Bourque | 2012-05-15 02:31:16 | Cannot find installers for 9.2 Beta |