Re: Queue table that quickly grows causes query planner to choose poor plan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Wheeler <dwheeler(at)dgitsystems(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Queue table that quickly grows causes query planner to choose poor plan
Date: 2018-06-27 18:27:33
Message-ID: 24443.1530124053@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

David Wheeler <dwheeler(at)dgitsystems(dot)com> writes:
> I'm having performance trouble with a particular set of queries. It goes a bit like this

> 1) queue table is initially empty, and very narrow (1 bigint column)
> 2) we insert ~30 million rows into queue table
> 3) we do a join with queue table to delete from another table (delete from a using queue where a.id<http://a.id> = queue.id<http://queue.id>), but postgres stats say that queue table is empty, so it uses a nested loop over all 30 million rows, taking forever

Although there's no way to have any useful pg_statistic stats if you won't
do an ANALYZE, the planner nonetheless can see the table's current
physical size, and what it normally does is to multiply the last-reported
tuple density (reltuples/relpages) by the current size. So if you're
getting an "empty table" estimate anyway, I have to suppose that the
table's state involves reltuples = 0 and relpages > 0. That's not a
good place to be in; it constrains the planner to believe that the table
is in fact devoid of tuples, because that's what the last ANALYZE saw.

Now, the initial state for a freshly-created or freshly-truncated table
is *not* that. It is reltuples = 0 and relpages = 0, representing an
undefined tuple density. Given that, the planner will make some guess
about average tuple size --- which is likely to be a very good guess,
for a table with only fixed-width columns --- and then compute a rowcount
estimate using that plus the observed physical size.

So I think your problem comes from oscillating between really-empty
and not-at-all-empty, and not using an idiomatic way of going back
to the empty state. Have you tried using TRUNCATE instead of DELETE?

> This queue table is empty 99% of the time, and the query in 3 runs immediately after step 2. Is there any likelyhood that tweaking the autoanalyze params would help in this case? I don't want to explicitly analyze the table between steps 2 and three either as there are other patterns of use where for example 0 rows are inserted in step 2 and this is expected to run very very quickly. Do I have any other options?

I am not following your aversion to sticking an ANALYZE in there,
either. It's not like inserting 30 million rows would be free.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Wheeler 2018-06-28 00:00:57 Re: Queue table that quickly grows causes query planner to choose poor plan
Previous Message Justin Pryzby 2018-06-27 18:05:18 Re: Queue table that quickly grows causes query planner to choose poor plan