Speeding up JSON + TSQUERY + GIN

From: "Sven R(dot) Kunze" <srkunze(at)mail(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Speeding up JSON + TSQUERY + GIN
Date: 2017-02-26 13:28:10
Message-ID: 2260e4e6-89ae-90dc-278c-81243b3c3510@mail.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello everyone,

I am currently evaluating the possibility of using PostgreSQL for
storing and querying jsonb+tsvector queries. Let's consider this setup:

create table docs (id serial primary key, meta jsonb);
# generate 10M entries, cf. appendix
create index docs_meta_idx ON docs using gin (meta jsonb_path_ops);
create index docs_name_idx ON docs using gin (to_tsvector('english',
meta->>'name'));
create index docs_address_idx ON docs using gin (to_tsvector('english',
meta->>'address'));

Testing around with some smaller datasets, functionality-wise it's
great. However increasing to 10M, things tend to slow down (using
PostgreSQL 9.5):

explain analyze select id from docs where meta @> '{"age": 20}';
Planning time: 0.121 ms
Execution time: 4873.507 ms

explain analyze select id from docs where meta @> '{"age": 20}';
Planning time: 0.122 ms
Execution time: 206.289 ms

explain analyze select id from docs where meta @> '{"age": 30}';
Planning time: 0.109 ms
Execution time: 7496.886 ms

explain analyze select id from docs where meta @> '{"age": 30}';
Planning time: 0.114 ms
Execution time: 1169.649 ms

explain analyze select id from docs where to_tsvector('english',
meta->>'name') @@ to_tsquery('english', 'john');
Planning time: 0.179 ms
Execution time: 10109.375 ms

explain analyze select id from docs where to_tsvector('english',
meta->>'name') @@ to_tsquery('english', 'john');
Planning time: 0.188 ms
Execution time: 238.854 ms

Using "select pg_prewarm('docs');" and on any of the indexes doesn't
help either.
After a "systemctl stop postgresql.service && sync && echo 3 >
/proc/sys/vm/drop_caches && systemctl start postgresql.service" the
age=20, 30 or name=john queries are slow again.

Is there a way to speed up or to warm up things permanently?

Regards,
Sven

Appendix I:

example json:

{"age": 20, "name": "Michelle Hernandez", "birth": "1991-08-16",
"address": "94753 Tina Bridge Suite 318\\nEmilyport, MT 75302"}

Appendix II:

The Python script to generate fake json data. Needs "pip install faker".

>>> python fake_json.py > test.json # generates 2M entries; takes some
time
>>> cat test.json | psql -c 'copy docs (meta) from stdin'
>>> cat test.json | psql -c 'copy docs (meta) from stdin'
>>> cat test.json | psql -c 'copy docs (meta) from stdin'
>>> cat test.json | psql -c 'copy docs (meta) from stdin'
>>> cat test.json | psql -c 'copy docs (meta) from stdin'

-- fake_json.py --

import faker, json;
fake = faker.Faker();
for i in range(2*10**6):
print(json.dumps({"name": fake.name(), "birth": fake.date(),
"address": fake.address(), "age":
fake.random_int(0,100)}).replace('\\n', '\\\\n'))

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Oleg Bartunov 2017-02-26 20:13:03 Re: Speeding up JSON + TSQUERY + GIN
Previous Message Dinesh Chandra 12108 2017-02-23 17:21:35 Re: How to log quires which are taking time in PostgreSQL 9.1.