Re: SQL performance

From: Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Szymon Guz <mabewlun(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: SQL performance
Date: 2013-06-03 14:46:34
Message-ID: CAAXGW-zDR0XA7K5=xO53J45TLGGapHqYdPnCOwqJm3Jd54begA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Kevin, the blocked should not be NULLABLE. I will fix that. This is
with a pretty tiny dataset. I'm a little paranoid that with a large one I
will have issues.

Believe it or not the query became faster when I put the tests for user_id
IS NOT NULL in there (and added an index for that) then without the tests
and index.

It kinda makes me wonder if (from a performance perspective) I should
change the schema to pull user_id out of contacts and created a related
table with {contacts.id, user_id} where user_id is never null.

On Mon, Jun 3, 2013 at 7:26 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

> Robert DiFalco <robert(dot)difalco(at)gmail(dot)com> wrote:
>
> > CREATE TABLE contacts
> > (
> > id BIGINT PRIMARY KEY NOT NULL, // generated
> >
> > blocked BOOL,
> > owner_id BIGINT NOT NULL,
> > user_id BIGINT,
> > FOREIGN KEY ( owner_id ) REFERENCES app_users ( id ) ON DELETE
> CASCADE,
> >
> > FOREIGN KEY ( user_id ) REFERENCES app_users ( id ) ON DELETE SET
> NULL
> > );
> > CREATE INDEX idx_contact_owner ON contacts ( owner_id );
> > CREATE INDEX idx_contact_mutual ON contacts ( owner_id, user_id ) WHERE
> user_id IS NOT NULL AND NOT blocked;
>
> Well, the first thing I note is that "blocked" can be NULL. You
> exclude rows from the result where it IS NULL in either row. That
> may be what you really want, but it seems worth mentioning. If you
> don't need to support missing values there, you might want to add a
> NOT NULL constraint. If it should be NULL when user_id is, but not
> otherwise, you might want a row-level constraint. You might shave
> a tiny amount off the runtime by getting rid of the redundant tests
> for NOT NULL on user_id; it cannot compare as either TRUE on either
> = or <> if either (or both) values are NULL.
>
> > explain analyze verbose
> > select c.user_id
> > from contact_entity c left outer join contact_entity c1 on c1.owner_id =
> c.user_id and c1.user_id = c.owner_id
> > where NOT c.blocked AND NOT c1.blocked AND c.owner_id = 24 AND c.user_id
> != 24
> > AND c.user_id IS NOT NULL AND c1.user_id IS NOT NULL
> > group by c.user_id;
>
> > Group (cost=0.00..9.00 rows=1 width=8) (actual time=0.170..0.301 rows=8
> loops=1)
> > Output: c.user_id
> > -> Merge Join (cost=0.00..9.00 rows=1 width=8) (actual
> time=0.166..0.270 rows=17 loops=1)
> > Output: c.user_id
> > Merge Cond: (c.user_id = c1.owner_id)
> > -> Index Scan using idx_contact_mutual on public.contact_entity
> c (cost=0.00..5.10 rows=2 width=16) (actual time=0.146..0.164 rows=11
> loops=1)
> > Output: c.id, c.blocked, c.first_name, c.last_name,
> c.owner_id, c.user_id
> > Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
> > Filter: (c.user_id <> 24)
> > Rows Removed by Filter: 1
> > -> Index Scan using idx_contact_mutual on public.contact_entity
> c1 (cost=0.00..6.45 rows=1 width=16) (actual time=0.012..0.049 rows=18
> loops=1)
> > Output: c1.id, c1.blocked, c1.first_name, c1.last_name,
> c1.owner_id, c1.user_id
> > Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id =
> 24))
> > Total runtime: 0.388 ms
>
> > explain analyze verbose
> > select c.user_id
> > from contact_entity c
> > where c.owner_id=24 and c.user_id<>24 and c.user_id IS NOT NULL and NOT
> c.blocked and (exists(
> > select 1
> > from contact_entity c1
> > where NOT c1.blocked and c1.owner_id=c.user_id and c1.user_id IS NOT
> NULL and c1.user_id=c.owner_id))
> > group by c.user_id;
>
> > Group (cost=0.00..9.00 rows=1 width=8) (actual time=0.048..0.159 rows=8
> loops=1)
> > Output: c.user_id
> > -> Merge Semi Join (cost=0.00..9.00 rows=1 width=8) (actual
> time=0.044..0.137 rows=9 loops=1)
> > Output: c.user_id
> > Merge Cond: (c.user_id = c1.owner_id)
> > -> Index Scan using idx_contact_mutual on public.contact_entity
> c (cost=0.00..5.10 rows=2 width=16) (actual time=0.024..0.042 rows=11
> loops=1)
> > Output: c.id, c.blocked, c.first_name, c.last_name,
> c.owner_id, c.user_id
> > Index Cond: ((c.owner_id = 24) AND (c.user_id IS NOT NULL))
> > Filter: (c.user_id <> 24)
> > Rows Removed by Filter: 1
> > -> Index Scan using idx_contact_mutual on public.contact_entity
> c1 (cost=0.00..6.45 rows=1 width=16) (actual time=0.011..0.047 rows=16
> loops=1)
> > Output: c1.id, c1.blocked, c1.first_name, c1.last_name,
> c1.owner_id, c1.user_id
> > Index Cond: ((c1.user_id IS NOT NULL) AND (c1.user_id =
> 24))
> > Total runtime: 0.224 ms
>
> So, it looks like you can get about 3000 to 4000 of these per
> second on a single connection -- at least in terms of server-side
> processing. Were you expecting more than that?
>
> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2013-06-03 16:49:36 Re: Advice on optimizing select/index
Previous Message Kevin Grittner 2013-06-03 14:26:18 Re: SQL performance