Re: Many DataFileRead - IO waits

From: Ben Snaidero <bensnaidero(at)geotab(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Many DataFileRead - IO waits
Date: 2020-02-28 14:01:50
Message-ID: CAEPE5bOtH=8dwFyu8MSOiksK56jAH7j_-i6c-_JatFx=6UofYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Feb 27, 2020 at 11:54 AM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> How big is ix_tabledata_intid_timestampdate_intotherid3_intotherid2 on
> disk? If you create another index with same fields, how much space does it
> take? Real question- are you vacuuming aggressively enough for your
> workload? Your index name seems to indicate that intotherid3 would be the
> third key, and yet the planner chose not to scan that deep and instead
> filtered after it found the relevant tuples based on intid and
> timestampdate. That seems peculiar to me.
>
> The documentation discourages multi-column indexes because they have
> limited application unless the same fields are always used. Personally, I
> don't love reviewing the stats of indexscans or how many tuples were
> fetched and having to guess how deeply the index was scanned for the
> various queries involved.
>
> I'd wonder if an index on only intid_timestampdate would be both much
> smaller and also have a more right-leaning pattern of information being
> added and accessed in terms of keeping frequently needing blocks in shared
> buffers.
>
> As a side note, that planning time seems high to me for such a simple
> query. Have you increased default_statistics_target significantly perhaps?
>

In this case the index is quite large ~400GB but as you can see from the
explain plan it's doing a backward scan and not accessing that many
buffers. Other servers with this issue are much smaller. We have
autovacuum set to the default setting but this table does not get any
deletes so I don't think that is the problem. I think the reason it does
not go deeper into the index keys is because it's just looking for the
first occurence based on date (limit 1) not all of them although even if
looking for all of them I think it would still scan in the same way since
there would be other intotherid3 values between the ones in this search key

In regards to default_statistics_target I have not increased this value at
all.

All this said regarding statistics and vacuum/bloat we restored a two day
old copy of the database (on one of the servers experiencing the issue) and
the issue was still present. These systems are all on cloud infrastructure
so I am leaning towards it being something hardware related (especially as
it's only happening on a few servers) but our cloud provider says nothing
has changed in this respect.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2020-02-28 15:30:28 Re: proposal: schema variables
Previous Message Michael Lewis 2020-02-27 16:54:04 Re: Many DataFileRead - IO waits