Re: Index no longer being used, destroying and recreating it restores use.

From: Koen De Groote <kdg(dot)dev(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Index no longer being used, destroying and recreating it restores use.
Date: 2020-06-05 15:34:34
Message-ID: CAGbX52FSLneL7U7bdm5hpY6MTxY30tPBE08uGoezWPaZRjLh0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The requested result: https://explain.depesz.com/s/G7mU

Also, the data from the statistic itself:

=> SELECT stxname, stxkeys, stxdependencies

-> FROM pg_statistic_ext

-> WHERE stxname = 's1';
stxname | stxkeys | stxdependencies
---------+---------+-----------------
s1 | 29 35 | <NULL>

On Fri, Jun 5, 2020 at 4:15 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 6/5/20 7:05 AM, Koen De Groote wrote:
> > I've collected all relevant info(I think so at least) and put it here:
> >
> > The table in question is used to keep filepath data, of files on a
> > harddrive.
> > The query in question is used to retrieve items which should be backed
> > up, but have not yet been.
> >
> > The relevant columns of the table:
> >
> > Table "public.item"
> > Column | Type |
> > Collation | Nullable | Default
> >
> ------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
> > id | bigint |
> > | not null | nextval('item_id_seq'::regclass)
> > shouldbebackedup | boolean |
> > | not null | true
> > backupperformed | boolean |
> > | not null | false
> > itemCreated | timestamp without time zone |
> > | | now()
> > filepath | text |
> > | |
> >
> >
> > The existing index, which no longer gets used:
> > "index_in_question" btree (shouldbebackedup, backupperformed,
> > itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed
> > = false
> >
> > The new index, made out of the exact same columns and conditions, get
> > used immediately after creation:
> > CREATE INDEX CONCURRENTLY index_test ON item USING btree
> > (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE
> > shouldbebackedup = true AND backupperformed = false;
> >
> >
> > The query in question will look something like this:
> > select * from item where shouldbebackedup=true and
> > itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order
> > by filepath asc, id asc limit 100 offset 10400;
>
> The result of EXPLAIN ANALYZE for above.
>
> >
> > Having done a count, there are around 13000 items here, without the
> > offset and limit.
> > That being said, the amount is entirely dependant on what was added on a
> > previous day.
> >
> >
> > I tried creating an extended statistic, like this, but it had no effect:
> > CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed
> > FROM item;
> >
> > Settings from the conf file I think are related:
> >
> > shared_buffers = 1024MB
> > effective_cache_size = 2048MB
> > random_page_cost = 1.1
> > effective_io_concurrency = 200
> > work_mem = 32MB
> >
> > Finally, I state again that this database gets a nightly "vacuum
> analyze".
> >
> > My thanks for looking at this and any suggestions one might have.
> >
> > Regards,
> > Koen
> >
> > On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> > <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
> >
> > Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> writes:
> > > On 6/4/20 9:43 AM, Tom Lane wrote:
> > >> It's possible that the index had bloated to the point where the
> > planner
> > >> thought it was cheaper to use a seqscan. Did you make a note of
> the
> > >> cost estimates for the different plans?
> >
> > > I missed the part where the OP pointed to a SO question. In that
> > > question where links to explain.depesz.com
> > <http://explain.depesz.com> output.
> >
> > Ah, I didn't bother to chase that link either.
> >
> > So the cost estimates are only a fraction of a percent apart, making
> > it unsurprising for not-so-large changes in the index size to cause
> > a flip in the apparently-cheapest plan. The real question then is
> > why the cost estimates aren't actually modeling the real execution
> > times very well; and I'd venture that that question boils down to
> > why is this rowcount estimate so far off:
> >
> > > -> Parallel Seq Scan on oscar mike_three
> > > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual
> > > time=159.800..158018.961 rows=23586 loops=3)
> > > Filter: (four AND (NOT bravo) AND (zulu <=
> > > 'echo'::timestamp without time zone))
> > > Rows Removed by Filter: 8610174
> >
> > We're not going to be able to answer that if the OP doesn't wish
> > to decloak his data a bit more ... but a reasonable guess is that
> > those filter conditions are correlated. With late-model Postgres
> > you might be able to improve matters by creating extended statistics
> > for this table.
> >
> > regards, tom lane
> >
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laura Smith 2020-06-05 18:20:48 Postgres12 - Confusion with pg_restore
Previous Message Peter J. Holzer 2020-06-05 15:30:57 Re: Binary encoding of TIMESTAMP WITH TIME ZONE