Re: Help Optimizing a Summary Query

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Help Optimizing a Summary Query
Date: 2014-12-09 19:44:45
Message-ID: CAO_YK0Uk3GyO3PxJwd-h_k3GAT8S15WMNhE5OE1kgwA=WohYHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
wrote:

> I'm sorry, I missed a JOIN on the second variation. It is:
>
> SELECT u.id, u.name, u.imageURL, u.bio,
> CASE
> WHEN f.friend_id IS NOT NULL THEN 'isFriend'
> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
> ELSE 'none'
> END AS 'friendStatus',
> (SELECT COUNT(1) AS d
> FROM friends f1
> JOIN friends f2 ON f1.fiend_id = f2.friend_id
> WHERE f1.user_id = 33 AND f2.user_id = u.id)
> FROM users u
> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
> <http://u.id>*
> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;
>
>
> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
> wrote:
>
>> I have users, friends, and friend_requests. I need a query that
>> essentially returns a summary containing:
>>
>> * user (name, imageURL, bio, ...)
>> * Friend status (relative to an active user)
>> * Is the user a friend of the active user?
>> * Has the user sent a friend request to the active user?
>> * Has the user received a friend request from the active user?
>> * # of mutualFriends
>> * Exclude the active user from the result set.
>>
>> So I have mocked this up two ways but both have complicated query plans
>> that will be problematic with large data sets. I'm thinking that my lack of
>> deep SQL knowledge is making me miss the obvious choice.
>>
>> Here's my two query examples:
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>> CASE
>> WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
>> f.friend_id = u.id) THEN 'isFriend'
>> WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
>> AND s.from_id = u.id) THEN 'hasSentRequest'
>> WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
>> AND r.from_id = 33) THEN 'hasReceivedRequest'
>> ELSE 'none'
>> END AS "friendStatus",
>> (SELECT COUNT(1)
>> FROM friends f1
>> JOIN friends f2 ON f1.friend_id = f2.friend_id
>> WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
>> FROM users u
>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>> CASE
>> WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>> ELSE 'none'
>> END AS 'friendStatus',
>> (SELECT COUNT(1) AS d
>> FROM friends f1
>> JOIN friends f2 ON f1.fiend_id = f2.friend_id
>> WHERE f1.user_id = 33 AND f2.user_id = u.id)
>> FROM users u
>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>
>> 33 is just the id of the active user I am using for testing. The WHERE
>> clause could be anything. I'm just using "u.name" here but I'm more
>> concerned about the construction of the result set than the WHERE clause.
>> These have more or less similar query plans, nothing that would change
>> things factorially. Is this the best I can do or am I missing the obvious?
>>
>> Here are the tables:
>>
>>
>> CREATE TABLE users (
>> id BIGINT,
>> name VARCHAR,
>> imageURL VARCHAR
>> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
>> phone_natl BIGINT, /* National Phone Number */
>> country_e164 SMALLINT, /* E164 country code */
>> email VARCHAR(255),
>> PRIMARY KEY (id),
>> UNIQUE (email),
>> UNIQUE (phone_natl, country_e164)
>> );
>>
>>
>> CREATE TABLE friends (
>> user_id BIGINT,
>> friend_id BIGINT,
>> PRIMARY KEY (user_id, user_id),
>> FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
>> FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friends_friend ON friends(friend_id);
>>
>> CREATE TABLE friend_requests (
>> from_id BIGINT,
>> to_id BIGINT,
>> created TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
>> PRIMARY KEY (from_id, user_id),
>> FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE,
>> FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);
>>
>> Let me know if you guys need anything else.
>>
>>
>

Hello Robert, none of your schemas worked for me, here's a clean version

CREATE TABLE users (
id BIGINT,
name VARCHAR,
imageURL VARCHAR,
created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
phone_natl BIGINT,
country_e164 SMALLINT,
email VARCHAR(255),
PRIMARY KEY (id),
UNIQUE (email),
UNIQUE (phone_natl, country_e164)
);

CREATE TABLE friends (
user_id BIGINT,
friend_id BIGINT,
PRIMARY KEY (user_id, friend_id),
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friends_friend ON friends(friend_id);

CREATE TABLE friend_requests (
from_id BIGINT,
to_id BIGINT,
created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (from_id, to_id),
FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE,
FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

I may look into the query later but here's some thoughts

* you want an index to speed up name searches on user.name, I suggest
pg_trgm + ILIKE as a starting point.
* you really expect more than a billion users? If not (or not in the near
future) use INT instead. This will save you a significant amount of memory
and eventually buy you time later.
* as long as you don't hit the disk for the queries you'll be fine, so make
sure you have enough memory or use read-slaves with smaller working sets.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message FarjadFarid(ChkNet) 2014-12-09 20:01:54 List of shorthand casts
Previous Message John R Pierce 2014-12-09 18:51:54 Re: Streaming Replication - changing IP addresses