Re: Creating tons of tables to support a query

From: Jan Ploski <jpljpl(at)gmx(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating tons of tables to support a query
Date: 2002-09-09 16:47:55
Message-ID: 14629117.1031590075510.JavaMail.jpl@remotejava
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Sep 09, 2002 at 11:41:38AM -0400, Tom Lane wrote:
> Yes, it makes sense that for a little-used section that way wouldn't be
> very efficient. I would suggest that you want to use an index on
> (sectionID, dateSent), and that the way to make the system do the
> right thing is
>
> select msgnum from message where
> sectionID = ? and
> dateSent > ?
> order by sectionID, dateSent
> limit 1;
>
> Without the extra ORDER BY clause, the planner is not smart enough to
> see that the requested ordering actually matches the index ordering.

Tom,

thanks you for advice. Now I get performance comparable to that
when using a partial index with "workarounds", i.e.

select msgnum from message where
sectionID=9 and
dateSent>'2000-11-12 02:05:35.94'
order by sectionID,dateSent limit 1;

works equally fast with an index on message(sectionID, dateSent)
as through a partial index on message where sectionID = 9.

When executed on the empty section 241, the statement is way
faster than the partial index based solution (0.67 seconds vs 8.5 seconds),
probably because I had to resort to trickery to make it work
reasonably at all.

What I still cannot grasp is why

select msgnum into v_cnt from message where sectionID = 241
order by dateSent desc limit 1;

is so much faster than

v_sid := 241;
select msgnum into v_cnt from message where sectionID = v_sid
order by dateSent desc limit 1;

as I mentioned in an earlier message. In fact, to get the partial index
solution up to decent performance, I had to write something like

delete from tmp;
execute
''insert into tmp select msgnum into v_cnt from message where sectionID = '' ||
v_sid || '' oder by dateSent desc limit 1'';
select max(cnt) into v_cnt from tmp;

Following Richard's suggestion, I turned query debugging on in
postgresql.conf. What jumps into my eyes in the log is

Shared blocks: 132 read (sectionID = 241)
vs
Shared blocks: 1517 read (sectionID = v_sid)

in postgres usage stats output after executing the query.

When I wrap the select statement into the awkward execute,
I get "152 read", which is still much, much better than "1517 read",
if that (apart from execution time) can be taken as an indicator
for performance. Do you have any clues?

> Another possible gotcha is that depending on datatype details the
> planner might be using only one of the two index columns. As far
> as I noticed, you didn't tell us the exact column datatypes or the
> exact form in which the comparison values are supplied?

The column types are integer for sectionID is and timestamp for dateSent.
I am passing parameters of these types into a PL/pgSQL procedure, which then
executes a "select into" with these parameters in the where clause.

Take care -
JPL

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-09-09 16:56:33 Re: Debugging plpgsql functions
Previous Message Alvaro Herrera 2002-09-09 16:24:52 Re: match one word