Quick Links

Re: performance issue with bitmap index scans on huge amounts of big jsonb documents

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>
Cc:	postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: performance issue with bitmap index scans on huge amounts of big jsonb documents
Date:	2016-12-09 17:03:29
Message-ID:	CAHyXU0zY8EwNSFYaNrzOpK-vuWruBTeZSyjLcV07VREAPkFs4g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
<marc-olaf(dot)jaschke(at)s24(dot)com> wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big jsonb documents.
>
>
> ===== Background =====
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> ===== Example =====
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 100000) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> -- query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 rows=1 loops=1)
> -> Bitmap Heap Scan on bitmap_scan_test (cost=120.78..272.49 rows=100 width=0) (actual time=16.496..617.431 rows=50000 loops=1)
> Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Heap Blocks: exact=637
> -> Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx (cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=50000 loops=1)
> Index Cond: (big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead Shared Object Symbol
> 51,06% postgres [.] pglz_decompress
> 7,33% libc-2.12.so [.] memcpy

Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data. See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

Naturally this is a huge tradeoff so do some careful analysis before
making the change.

merlin

In response to

performance issue with bitmap index scans on huge amounts of big jsonb documents at 2016-11-30 12:26:16 from Marc-Olaf Jaschke

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andreas Joseph Krogh	2016-12-09 17:56:21	Re: Querying with multicolumn index
Previous Message	Eric Jiang	2016-12-09 17:00:16	Querying with multicolumn index