Re: Drastic select count performance hit when jsonb GIN indices are present

From: Anton Melser <melser(dot)anton(at)gmail(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Drastic select count performance hit when jsonb GIN indices are present
Date: 2014-12-26 16:26:14
Message-ID: CAKywjPoxbVsz_GSmci-WGd3_tySDWLoErioukvQ1thfv-qM7jg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Maxim,

Thanks for the suggestions. Unfortunately, it does appear to be an IO issue
but not one I can get around (on my laptop at least!). Postgres seems to
refuse to put any of the index or table in shared buffers at all. For some
reason, there seems to be a very great increase in space required for
having each event (item) on its own row as opposed to having it with the
original items array within far fewer database rows:

relation | size
-------------------------------------+------------
public.myevents | 3502 MB
...
pg_toast.pg_toast_57487 | 800 MB
...
public.events | 2232 kB
...

myevents is where each event has a row to itself and events has 100 events
in an array within the JSON rows (and the toast table above is where the
data seem to be stored) - actually the events table has more data in it as
there are the extra paging objects which have been removed from myevents.
Performing vacuum analyze seems to have no effect either.

Getting back to my original point - you pointed out that for queries that
need a decent % of the table it will be cheaper to do a scan, which is
exactly what the query planner does for the relational version. If it only
needs a small % of the values it looks at the index and for a large % it
goes for a scan (it also puts everything in shared buffers and is
lightening quick!). Is this just a lack of maturity in the jsonb planner or
am I missing something?

Thanks again,
Anton

On 26 December 2014 at 14:19, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

> -> Bitmap Heap Scan on myevents (cost=35.80..3615.09 rows=3716
> width=0) (actual time=351.510..77669.907 rows=1417152 loops=1)
>
>> Recheck Cond: (event @> '{"event": "delivered"}'::jsonb)
>> Heap Blocks: exact=298362
>> ​​
>> Buffers: shared hit=1 read=298589
>>
> ​...​
>
>> Execution time: 80986.340 ms
>>
>
>
>> -> Bitmap Heap Scan on
>> ​​
>> myevents (cost=42.80..3622.09 rows=3716 width=0) (actual
>> time=534.816..78526.944 rows=1417152 loops=1)
>> Recheck Cond: (event @> '{"event": "delivered"}'::jsonb)
>> Heap Blocks: exact=298362
>> Buffers: shared hit=383 read=299133
>>
> ​...
>> Execution time: 81898.578 ms
>>
>
> ​Hi Anton,
>
> What you see there (i think) - it's a performance hit of random disk read
> for non-cached database.
> Try increase a shared buffers to value when table and index could fit
> into, and redo queries few time until you see something like
> Buffers: shared hit=bigvalue read=0 and compare performance, it might
> change timing quite a lot.
>
> Also, I recommend set track_io_timing=on in postgresql.conf and after it
> use explain (analyze, buffers, timing) to see check how much time database
> spent doing IO operations.
> Also try perform vacuum analyze ​myevents; before testing because it seems
> that you have no up to date visibility map on the table.
>
>
> However, even in fully cached case selecting 40% on the table rows almost
> always will be faster via sequential scan, so I don't expect miracles.
>
> --
> Maxim Boguk
> Senior Postgresql DBA
> http://www.postgresql-consulting.ru/
> <http://www.postgresql-consulting.com/>
>
> Phone RU: +7 910 405 4718
> Phone AU: +61 45 218 5678
>
> LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
> Skype: maxim.boguk
> Jabber: maxim(dot)boguk(at)gmail(dot)com
> МойКруг: http://mboguk.moikrug.ru/
>
> "People problems are solved with people.
> If people cannot solve the problem, try technology.
> People will then wish they'd listened at the first stage."
>

--
echo '16i[q]sa[ln0=aln100%Pln100/snlbx]sbA0D4D465452snlbxq' | dc
This will help you for 99.9% of your problems ...

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-12-26 16:27:33 Re: Check constraint on foreign table using SQL function
Previous Message Cory Zue 2014-12-26 15:43:56 Re: help troubleshooting invalid page header error