Re: Fast distinct not working as expected

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Franck Routier <franck(dot)routier(at)axege(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Fast distinct not working as expected
Date: 2014-04-17 18:17:28
Message-ID: CAMkU=1y6bd7h6oTLK5BAVS1N3ayNcUCrzRBVUv1OykkNeo6P=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Apr 17, 2014 at 10:17 AM, Franck Routier
<franck(dot)routier(at)axege(dot)com>wrote:

>
>
> My best guess would be that the index got stuffed full of entries for
> rows that are not visible, either because they are not yet committed, or
> have been deleted but are not yet vacuumable. Do you have any long-lived
> transactions?
>
> There has been a delete on the table (about 20% of the records). Then a
> manual VACUUM.
> We have recreated the index, but it did not help.
>

If there are any open transactions (even ones that have never touched this
particular table) which started before the delete was committed, then the
vacuum was obliged to keep those deleted records around, in case that open
transaction happens to start caring about them. I assume that the deleted
rows were not randomly distributed, but rather were concentrated in the
exact range you are now inspecting.

The reindex was also obliged to carry those deleted but not yet
uninteresting rows along to the new index.

>
> In the explain analyze output, the index scan begins at 5798.912. What can
> be happening before that ?
>

The index scan reports it first *visible* row at 5798.912. Before that, it
was probably digging through thousands or millions of deleted rows,
labouriously verifying that they are not visible to it (but still
theoretically visible to someone else).

It could be blocked on a lock or something, or you could have really
horrible IO contention that takes 5 seconds to read two blocks. But I
think the other scenario is more likely.

By the way, many people don't like silent cross-posting, as then we end up
unknowningly answering a question here that was already answered elsewhere.

http://stackoverflow.com/questions/23137713/postgresql-query-plan-delay

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Roxanne Reid-Bennett 2014-04-18 05:46:36 Re: Approach to Data Summary and Analysis
Previous Message Franck Routier 2014-04-17 17:17:48 Re: Fast distinct not working as expected