From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com> |
Cc: | postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: performance issue with bitmap index scans on huge amounts of big jsonb documents |
Date: | 2016-12-09 17:03:29 |
Message-ID: | CAHyXU0zY8EwNSFYaNrzOpK-vuWruBTeZSyjLcV07VREAPkFs4g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
<marc-olaf(dot)jaschke(at)s24(dot)com> wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big jsonb documents.
>
>
> ===== Background =====
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> ===== Example =====
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 100000) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> -- query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 rows=1 loops=1)
> -> Bitmap Heap Scan on bitmap_scan_test (cost=120.78..272.49 rows=100 width=0) (actual time=16.496..617.431 rows=50000 loops=1)
> Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Heap Blocks: exact=637
> -> Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx (cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=50000 loops=1)
> Index Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead Shared Object Symbol
> 51,06% postgres [.] pglz_decompress
> 7,33% libc-2.12.so [.] memcpy
Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data. See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
Naturally this is a huge tradeoff so do some careful analysis before
making the change.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Joseph Krogh | 2016-12-09 17:56:21 | Re: Querying with multicolumn index |
Previous Message | Eric Jiang | 2016-12-09 17:00:16 | Querying with multicolumn index |