Re: GIN index on JSONB not used due to lack of nested statistics

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alessandro Baretta <alex(at)atomifi(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: GIN index on JSONB not used due to lack of nested statistics
Date: 2019-10-30 22:31:44
Message-ID: CAMkU=1ywOYFBgYfCzwkdGCEs4xJ0zUs_kSYFL36fcETEr64YGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Oct 30, 2019 at 12:25 PM Alessandro Baretta <alex(at)atomifi(dot)com>
wrote:

> -> Bitmap Index Scan on idx_object
> (cost=0.00..75580.00 rows=1 width=0) (actual time=24.094..24.094
> rows=77 loops=1)
> Index Cond: ((... @> ...::jsonb) AND (...
> @> ...::jsonb))
> Planning time: 0.301 ms
> Execution time: 24.723 ms
> (14 rows)
>
> It would seem that this miscalculation of the cost of the index scan
> is due to the query planner lacking detailed statistics about the
> relevant JSONB column.

Since it expected 1 row but actually found 77, I think that if it had
accurate statistics it would have overestimated the costs by even more.

Can you repeat the executions with "EXPLAIN (ANALYZE, BUFFERS)"?

How does the cost estimate change if you make effective_cache_size much
larger or much smaller? (No need for ANALYZE, just the cost estimate)

What kind of performance do you get if you turn enable_seqscan and then
repeat the query from a cold start (restart PostgreSQL, then run sudo sh -c
"echo 3 > /proc/sys/vm/drop_caches"). If the performance is very fast
after a cold start, then something is wrong with the planner estimate. If
it is slow from a cold start, then the planner has at least a plausible
basis for charging as much as it does.

If you run the query with just one branch of your AND at a time, what is
the expected and actual number of rows?

> The workaround I found so far is to set a low value of
> random_page_cost, but this could result in the query planner using
> index scans for other tables and other queries, where a seq scan would
> actually be more appropriate.
>

Based on what you know about your IO system, and the cacheability of your
data, what is the appropriate setting of random_page_cost from first
principles? Maybe it is those other queries which have the problem, not
this one.

If you can up with a random object generator which creates data structured
similar to yours, and shows the same issue when run with disclosable
queries, that would help us look into it.

Also, what version are you running?

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Daulat Ram 2019-10-31 17:29:34 Barman
Previous Message M Tarkeshwar Rao 2019-10-30 16:47:27 RE: Can you please tell us how set this prefetch attribute in following lines.