Quick Links

Re: Index no longer being used, destroying and recreating it restores use.

From:	Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To:	Koen De Groote <kdg(dot)dev(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: Index no longer being used, destroying and recreating it restores use.
Date:	2020-06-05 14:15:27
Message-ID:	92ea007a-abe0-1dd2-7e5c-080da70c2e2c@aklaver.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 6/5/20 7:05 AM, Koen De Groote wrote:
> I've collected all relevant info(I think so at least) and put it here:
>
> The table in question is used to keep filepath data, of files on a
> harddrive.
> The query in question is used to retrieve items which should be backed
> up, but have not yet been.
>
> The relevant columns of the table:
>
> Table "public.item"
> Column | Type |
> Collation | Nullable | Default
> ------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
> id | bigint |
> | not null | nextval('item_id_seq'::regclass)
> shouldbebackedup | boolean |
> | not null | true
> backupperformed | boolean |
> | not null | false
> itemCreated | timestamp without time zone |
> | | now()
> filepath | text |
> | |
>
>
> The existing index, which no longer gets used:
> "index_in_question" btree (shouldbebackedup, backupperformed,
> itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed
> = false
>
> The new index, made out of the exact same columns and conditions, get
> used immediately after creation:
> CREATE INDEX CONCURRENTLY index_test ON item USING btree
> (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE
> shouldbebackedup = true AND backupperformed = false;
>
>
> The query in question will look something like this:
> select * from item where shouldbebackedup=true and
> itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order
> by filepath asc, id asc limit 100 offset 10400;

The result of EXPLAIN ANALYZE for above.

>
> Having done a count, there are around 13000 items here, without the
> offset and limit.
> That being said, the amount is entirely dependant on what was added on a
> previous day.
>
>
> I tried creating an extended statistic, like this, but it had no effect:
> CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed
> FROM item;
>
> Settings from the conf file I think are related:
>
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> random_page_cost = 1.1
> effective_io_concurrency = 200
> work_mem = 32MB
>
> Finally, I state again that this database gets a nightly "vacuum analyze".
>
> My thanks for looking at this and any suggestions one might have.
>
> Regards,
> Koen
>
> On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>
> Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>> writes:
> > On 6/4/20 9:43 AM, Tom Lane wrote:
> >> It's possible that the index had bloated to the point where the
> planner
> >> thought it was cheaper to use a seqscan. Did you make a note of the
> >> cost estimates for the different plans?
>
> > I missed the part where the OP pointed to a SO question. In that
> > question where links to explain.depesz.com
> <http://explain.depesz.com> output.
>
> Ah, I didn't bother to chase that link either.
>
> So the cost estimates are only a fraction of a percent apart, making
> it unsurprising for not-so-large changes in the index size to cause
> a flip in the apparently-cheapest plan. The real question then is
> why the cost estimates aren't actually modeling the real execution
> times very well; and I'd venture that that question boils down to
> why is this rowcount estimate so far off:
>
> > -> Parallel Seq Scan on oscar mike_three
> > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual
> > time=159.800..158018.961 rows=23586 loops=3)
> > Filter: (four AND (NOT bravo) AND (zulu <=
> > 'echo'::timestamp without time zone))
> > Rows Removed by Filter: 8610174
>
> We're not going to be able to answer that if the OP doesn't wish
> to decloak his data a bit more ... but a reasonable guess is that
> those filter conditions are correlated. With late-model Postgres
> you might be able to improve matters by creating extended statistics
> for this table.
>
> regards, tom lane
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Re: Index no longer being used, destroying and recreating it restores use. at 2020-06-05 14:05:21 from Koen De Groote

Responses

Re: Index no longer being used, destroying and recreating it restores use. at 2020-06-05 15:34:34 from Koen De Groote

Browse pgsql-general by date

	From	Date	Subject
Next Message	Thomas Kellerer	2020-06-05 14:19:36	Re: Oracle vs. PostgreSQL - a comment
Previous Message	Koen De Groote	2020-06-05 14:05:21	Re: Index no longer being used, destroying and recreating it restores use.