Retrieving multiple columns from a subquery

From: Chris Hanks <christopher(dot)m(dot)hanks(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Retrieving multiple columns from a subquery
Date: 2012-05-09 16:42:01
Message-ID: CAK7KUdDcmjF_sWZwXkt6PdfZzSTE=V2ovYF-+6AgGanv3w-zBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello -

I have two tables:

CREATE TABLE users
(
id serial NOT NULL,
created_at timestamp with time zone NOT NULL,
last_seen_at timestamp with time zone NOT NULL,
-- some other columns...
)

CREATE TABLE emails
(
user_id integer NOT NULL,
address text NOT NULL,
created_at timestamp with time zone NOT NULL,
confirmed_at timestamp with time zone,
-- some other columns...
CONSTRAINT emails_user_id_fkey FOREIGN KEY (user_id)
REFERENCES users (id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)

The gist is that a single user can be related to multiple emails, and some
email addresses are confirmed (they've clicked a link I've sent there, so I
know it's valid) and some aren't.

Routinely, when I'm fetching users from the db I also want to get the best
email address for each user. That is, the email address that they've
confirmed the most recently, or failing that, the one that they created the
most recently. I've been doing this via a subselect:

SELECT *,
(SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY
"confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email"
FROM "users"

I like the subquery approach because I can use my ORM to easily append it
to whatever query I'm running against the users table (whether I'm looking
up one user or many), without having to do an explicit join and trim out
the unnecessary rows. Also, in the future I'm planning on adding additional
subqueries to get (for example) each user's current subscription status,
and I'm afraid that the joins will get ungainly. Besides, I find subqueries
much easier to reason about than joins.

My problem is that now I need to get not only the best email's address, but
whether it is confirmed (whether confirmed_at is not null). My first
attempt was to simply repeat the subquery:

SELECT *,
(SELECT "address" FROM "emails" WHERE ("user_id" = "id") ORDER BY
"confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) AS "best_email",
((SELECT "confirmed_at" FROM "emails" WHERE ("user_id" = "id") ORDER BY
"confirmed_at" DESC NULLS LAST, "created_at" DESC LIMIT 1) IS NOT NULL) AS
"best_email_confirmed"
FROM "users"

I had hoped Postgres would recognize that the two subqueries were
identical, but judging from the explain output from my development database
it's not, and it's simply running the subquery twice instead:

"Seq Scan on users (cost=0.00..333.65 rows=13 width=81)"
" SubPlan 1"
" -> Limit (cost=12.79..12.79 rows=1 width=48)"
" -> Sort (cost=12.79..12.80 rows=5 width=48)"
" Sort Key: public.emails.confirmed_at,
public.emails.created_at"
" -> Bitmap Heap Scan on emails (cost=4.29..12.76 rows=5
width=48)"
" Recheck Cond: (user_id = users.id)"
" -> Bitmap Index Scan on emails_pkey
(cost=0.00..4.29 rows=5 width=0)"
" Index Cond: (user_id = users.id)"
" SubPlan 2"
" -> Limit (cost=12.79..12.79 rows=1 width=16)"
" -> Sort (cost=12.79..12.80 rows=5 width=16)"
" Sort Key: public.emails.confirmed_at,
public.emails.created_at"
" -> Bitmap Heap Scan on emails (cost=4.29..12.76 rows=5
width=16)"
" Recheck Cond: (user_id = users.id)"
" -> Bitmap Index Scan on emails_pkey
(cost=0.00..4.29 rows=5 width=0)"
" Index Cond: (user_id = users.id)"

It would be ideal if I could pull both results from the same subquery,
something like:

SELECT *,
(SELECT "address", "confirmed_at" IS NOT NULL FROM "emails" WHERE
("user_id" = "id") ORDER BY "confirmed_at" DESC NULLS LAST, "created_at"
DESC LIMIT 1) AS ("best_email", "best_email_confirmed")
FROM "users"

But this isn't valid syntax. I tried putting the subquery under a FROM
clause, but it won't work with my "user_id" = "id" condition, and throws
"ERROR: subquery in FROM cannot refer to other relations of same query
level". I think CTEs might be an answer, but I'm stuck on 8.3 for the
foreseeable future, which doesn't support them.

Does anyone have any suggestions?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Antonio Goméz Soto 2012-05-09 20:50:39 Re: 2 machines, same database, same query, 10 times slower?
Previous Message Guillaume Lelarge 2012-05-09 16:09:39 PostgreSQL Session #4 : Call for Papers