Re: Help Optimizing a Summary Query

From: Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
To: Arthur Silva <arthurprs(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Help Optimizing a Summary Query
Date: 2014-12-11 20:52:37
Message-ID: CAAXGW-zN7jHiRPfCqpp-m4CgSeS6V_ZoczDEV9uZgbn7Vq7Kqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Arthur. I don't think there is as big a different between BIGINT and
INTEGER as you think there is. In fact with an extended filesystem you
might not see any difference at all.

As I put in the first emal I am using a GIST index on user.name.

I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and
if there was a better alternative I had not considered.

On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva <arthurprs(at)gmail(dot)com> wrote:

> On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
> wrote:
>
>> I'm sorry, I missed a JOIN on the second variation. It is:
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>> CASE
>> WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>> ELSE 'none'
>> END AS 'friendStatus',
>> (SELECT COUNT(1) AS d
>> FROM friends f1
>> JOIN friends f2 ON f1.fiend_id = f2.friend_id
>> WHERE f1.user_id = 33 AND f2.user_id = u.id)
>> FROM users u
>> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
>> <http://u.id>*
>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;
>>
>>
>> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco <robert(dot)difalco(at)gmail(dot)com
>> > wrote:
>>
>>> I have users, friends, and friend_requests. I need a query that
>>> essentially returns a summary containing:
>>>
>>> * user (name, imageURL, bio, ...)
>>> * Friend status (relative to an active user)
>>> * Is the user a friend of the active user?
>>> * Has the user sent a friend request to the active user?
>>> * Has the user received a friend request from the active user?
>>> * # of mutualFriends
>>> * Exclude the active user from the result set.
>>>
>>> So I have mocked this up two ways but both have complicated query plans
>>> that will be problematic with large data sets. I'm thinking that my lack of
>>> deep SQL knowledge is making me miss the obvious choice.
>>>
>>> Here's my two query examples:
>>>
>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>> CASE
>>> WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
>>> f.friend_id = u.id) THEN 'isFriend'
>>> WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
>>> AND s.from_id = u.id) THEN 'hasSentRequest'
>>> WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
>>> AND r.from_id = 33) THEN 'hasReceivedRequest'
>>> ELSE 'none'
>>> END AS "friendStatus",
>>> (SELECT COUNT(1)
>>> FROM friends f1
>>> JOIN friends f2 ON f1.friend_id = f2.friend_id
>>> WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
>>> FROM users u
>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>
>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>> CASE
>>> WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>> WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>> WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>> ELSE 'none'
>>> END AS 'friendStatus',
>>> (SELECT COUNT(1) AS d
>>> FROM friends f1
>>> JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>> WHERE f1.user_id = 33 AND f2.user_id = u.id)
>>> FROM users u
>>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>
>>> 33 is just the id of the active user I am using for testing. The WHERE
>>> clause could be anything. I'm just using "u.name" here but I'm more
>>> concerned about the construction of the result set than the WHERE clause.
>>> These have more or less similar query plans, nothing that would change
>>> things factorially. Is this the best I can do or am I missing the obvious?
>>>
>>> Here are the tables:
>>>
>>>
>>> CREATE TABLE users (
>>> id BIGINT,
>>> name VARCHAR,
>>> imageURL VARCHAR
>>> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
>>> phone_natl BIGINT, /* National Phone Number */
>>> country_e164 SMALLINT, /* E164 country code */
>>> email VARCHAR(255),
>>> PRIMARY KEY (id),
>>> UNIQUE (email),
>>> UNIQUE (phone_natl, country_e164)
>>> );
>>>
>>>
>>> CREATE TABLE friends (
>>> user_id BIGINT,
>>> friend_id BIGINT,
>>> PRIMARY KEY (user_id, user_id),
>>> FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
>>> FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE
>>> );
>>> CREATE INDEX idx_friends_friend ON friends(friend_id);
>>>
>>> CREATE TABLE friend_requests (
>>> from_id BIGINT,
>>> to_id BIGINT,
>>> created TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
>>> PRIMARY KEY (from_id, user_id),
>>> FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE,
>>> FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE
>>> );
>>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);
>>>
>>> Let me know if you guys need anything else.
>>>
>>>
>>
>
> Hello Robert, none of your schemas worked for me, here's a clean version
>
> CREATE TABLE users (
> id BIGINT,
> name VARCHAR,
> imageURL VARCHAR,
> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
> phone_natl BIGINT,
> country_e164 SMALLINT,
> email VARCHAR(255),
> PRIMARY KEY (id),
> UNIQUE (email),
> UNIQUE (phone_natl, country_e164)
> );
>
>
> CREATE TABLE friends (
> user_id BIGINT,
> friend_id BIGINT,
> PRIMARY KEY (user_id, friend_id),
> FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
> FOREIGN KEY (friend_id) REFERENCES users(id) ON DELETE CASCADE
> );
> CREATE INDEX idx_friends_friend ON friends(friend_id);
>
> CREATE TABLE friend_requests (
> from_id BIGINT,
> to_id BIGINT,
> created TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
> PRIMARY KEY (from_id, to_id),
> FOREIGN KEY (from_id) REFERENCES users(id) ON DELETE CASCADE,
> FOREIGN KEY (to_id) REFERENCES users(id) ON DELETE CASCADE
> );
> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);
>
> I may look into the query later but here's some thoughts
>
> * you want an index to speed up name searches on user.name, I suggest
> pg_trgm + ILIKE as a starting point.
> * you really expect more than a billion users? If not (or not in the near
> future) use INT instead. This will save you a significant amount of memory
> and eventually buy you time later.
> * as long as you don't hit the disk for the queries you'll be fine, so
> make sure you have enough memory or use read-slaves with smaller working
> sets.
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Arthur Silva 2014-12-11 21:31:29 Re: Help Optimizing a Summary Query
Previous Message Tom Lane 2014-12-11 20:46:18 Re: Out of memory condition