From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mark <mwchambers(at)gmail(dot)com> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2 |
Date: | 2019-01-02 16:11:39 |
Message-ID: | 2025.1546445499@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Mark <mwchambers(at)gmail(dot)com> writes:
> I have a long running job which deletes all of the common_student table and
> then repopulates it. It takes long time to load all the other data and
> commit the transaction. I didn't think the delete inside the transaction
> would have any effect until it is commited or rolled back.
> I will have to rewrite the application so it updates the existing rows
> rather than deleting all and then inserting.
Hmm ... I'm not sure if that will actually make things better. The root
of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:
* We count delete-in-progress rows as still live, using
* the same reasoning given above; but we don't bother to
* include them in the sample.
The "reasoning given above" is a reference to what happens for
INSERT_IN_PROGRESS tuples:
* Insert-in-progress rows are not counted. We assume
* that when the inserting transaction commits or aborts,
* it will send a stats message to increment the proper
* count. This works right only if that transaction ends
* after we finish analyzing the table; if things happen
* in the other order, its stats update will be
* overwritten by ours. However, the error will be large
* only if the other transaction runs long enough to
* insert many tuples, so assuming it will finish after us
* is the safer option.
Now the problem with this, from your perspective, is that *neither*
INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
ANALYZE's data sample. So a long-running update transaction will
cause all the rows it changes to be excluded from the sample until
commit. This seems like a bad thing, and it definitely means that
adjusting your app as you're contemplating won't help.
In order to handle the bulk-update scenario sanely, it seems like
we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
to be included. And it looks like, for consistency with the row-counting
logic, the one that's included needs to be DELETE_IN_PROGRESS. This
is a slightly annoying conclusion, because it means we're accumulating
stats that we know are likely to soon be obsolete, but I think sampling
INSERT_IN_PROGRESS tuples instead would lead to very strange results
in some cases.
In short, I think we want to do this to the DELETE_IN_PROGRESS logic:
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
deadrows += 1;
else
+ {
+ sample_it = true;
liverows += 1;
+ }
with suitable adjustment of the adjacent comment.
Thoughts?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Rich Shepard | 2019-01-02 17:12:47 | Implementing standard SQL's DOMAIN constraint |
Previous Message | Mark | 2019-01-02 16:04:03 | Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2 |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-01-02 16:35:46 | Re: [PATCH] check for ctags utility in make_ctags |
Previous Message | Mark | 2019-01-02 16:04:03 | Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2 |