Re: Efficient sorting the results of a join, without denormalization

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: "Glen M(dot) Witherington" <glen(at)fea(dot)st>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Efficient sorting the results of a join, without denormalization
Date: 2015-05-31 20:09:58
Message-ID: 20150531160958.383c50f0f5db120032f3b483@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, 31 May 2015 04:50:00 -0500
"Glen M. Witherington" <glen(at)fea(dot)st> wrote:
>
> On Sun, May 31, 2015, at 12:53 AM, Tom Lane wrote:
> > "Glen M. Witherington" <glen(at)fea(dot)st> writes:
> > > And here's the query I want to do, efficiently:
> >
> > > SELECT * FROM c
> > > JOIN b ON b.id = c.b_id
> > > JOIN a ON a.id = b.a_id
> > > WHERE a.id = 3
> > > ORDER BY b.created_at DESC
> > > LIMIT 10
> >
> > At least for that dummy data, this seems sufficient:
> >
> > regression=# create index on b (a_id, created_at);
> > CREATE INDEX
> > regression=# explain analyze SELECT * FROM c
> > JOIN b ON b.id = c.b_id
> > JOIN a ON a.id = b.a_id
> > WHERE a.id = 3
> > ORDER BY b.created_at DESC
> > LIMIT 10;
> > QUERY
> > PLAN
> > ------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=0.14..21.95 rows=10 width=64) (actual time=0.064..1.176
> > rows=10 loops=1)
> > -> Nested Loop (cost=0.14..436079.81 rows=200000 width=64) (actual
> > time=0.063..1.173 rows=10 loops=1)
> > Join Filter: (b.id = c.b_id)
> > Rows Removed by Join Filter: 1218
> > -> Nested Loop (cost=0.14..9.81 rows=20 width=40) (actual
> > time=0.035..0.035 rows=1 loops=1)
> > -> Index Scan Backward using b_a_id_created_at_idx on b
> > (cost=0.14..8.49 rows=20 width=24) (actual
> > time=0.019..0.019 rows=1 loops=1)
> > Index Cond: (a_id = 3)
> > -> Materialize (cost=0.00..1.07 rows=1 width=16) (actual
> > time=0.013..0.013 rows=1 loops=1)
> > -> Seq Scan on a (cost=0.00..1.06 rows=1 width=16)
> > (actual time=0.009..0.009 rows=1 loops=1)
> > Filter: (id = 3)
> > Rows Removed by Filter: 2
> > -> Materialize (cost=0.00..27230.00 rows=1000000 width=24)
> > (actual time=0.008..0.811 rows=1228 loops=1)
> > -> Seq Scan on c (cost=0.00..16370.00 rows=1000000
> > width=24) (actual time=0.007..0.310 rows=1228 loops=1)
> > Planning time: 0.796 ms
> > Execution time: 1.390 ms
> > (15 rows)
> >
> > regards, tom lane
>
> Wow, sorry I screwed up the query. It should be:
>
> ORDER BY c.created_at DESC
>
> Not b, or as you noted its trivial to index. Sorry!

Creating an index on c.created_at sped things up by a factor of over
1000, which caused the case you defined to run in ~0.5ms for me.

--
Bill Moran <wmoran(at)potentialtech(dot)com>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Glen M. Witherington 2015-05-31 23:16:24 Re: Efficient sorting the results of a join, without denormalization
Previous Message Francisco Olarte 2015-05-31 18:16:31 Re: Efficient sorting the results of a join, without denormalization