From: | Oleg Bartunov <obartunov(at)gmail(dot)com> |
---|---|
To: | "Sven R(dot) Kunze" <srkunze(at)mail(dot)de> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Speeding up JSON + TSQUERY + GIN |
Date: | 2017-02-26 20:13:03 |
Message-ID: | CAF4Au4wVfx1or_DR1P9t4peheOm0ps-1bNhFDXnPdkZGLEpGqQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Sun, Feb 26, 2017 at 4:28 PM, Sven R. Kunze <srkunze(at)mail(dot)de> wrote:
> Hello everyone,
>
> I am currently evaluating the possibility of using PostgreSQL for storing
> and querying jsonb+tsvector queries. Let's consider this setup:
>
> create table docs (id serial primary key, meta jsonb);
> # generate 10M entries, cf. appendix
> create index docs_meta_idx ON docs using gin (meta jsonb_path_ops);
> create index docs_name_idx ON docs using gin (to_tsvector('english',
> meta->>'name'));
> create index docs_address_idx ON docs using gin (to_tsvector('english',
> meta->>'address'));
>
>
functional index tends to be slow, better use separate column(s) for
tsvector
>
> Testing around with some smaller datasets, functionality-wise it's great.
> However increasing to 10M, things tend to slow down (using PostgreSQL 9.5):
>
>
> explain analyze select id from docs where meta @> '{"age": 20}';
> Planning time: 0.121 ms
> Execution time: 4873.507 ms
>
> explain analyze select id from docs where meta @> '{"age": 20}';
> Planning time: 0.122 ms
> Execution time: 206.289 ms
>
>
>
> explain analyze select id from docs where meta @> '{"age": 30}';
> Planning time: 0.109 ms
> Execution time: 7496.886 ms
>
> explain analyze select id from docs where meta @> '{"age": 30}';
> Planning time: 0.114 ms
> Execution time: 1169.649 ms
>
>
>
> explain analyze select id from docs where to_tsvector('english',
> meta->>'name') @@ to_tsquery('english', 'john');
> Planning time: 0.179 ms
> Execution time: 10109.375 ms
>
> explain analyze select id from docs where to_tsvector('english',
> meta->>'name') @@ to_tsquery('english', 'john');
> Planning time: 0.188 ms
> Execution time: 238.854 ms
>
what is full output from explain analyze ?
>
>
> Using "select pg_prewarm('docs');" and on any of the indexes doesn't help
> either.
> After a "systemctl stop postgresql.service && sync && echo 3 >
> /proc/sys/vm/drop_caches && systemctl start postgresql.service" the age=20,
> 30 or name=john queries are slow again.
>
>
> Is there a way to speed up or to warm up things permanently?
>
>
> Regards,
> Sven
>
>
> Appendix I:
>
> example json:
>
> {"age": 20, "name": "Michelle Hernandez", "birth": "1991-08-16",
> "address": "94753 Tina Bridge Suite 318\\nEmilyport, MT 75302"}
>
>
>
> Appendix II:
>
>
> The Python script to generate fake json data. Needs "pip install faker".
>
> >>> python fake_json.py > test.json # generates 2M entries; takes some
> time
> >>> cat test.json | psql -c 'copy docs (meta) from stdin'
> >>> cat test.json | psql -c 'copy docs (meta) from stdin'
> >>> cat test.json | psql -c 'copy docs (meta) from stdin'
> >>> cat test.json | psql -c 'copy docs (meta) from stdin'
> >>> cat test.json | psql -c 'copy docs (meta) from stdin'
>
>
> -- fake_json.py --
>
> import faker, json;
> fake = faker.Faker();
> for i in range(2*10**6):
> print(json.dumps({"name": fake.name(), "birth": fake.date(),
> "address": fake.address(), "age": fake.random_int(0,100)}).replace('\\n',
> '\\\\n'))
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Dinesh Chandra 12108 | 2017-02-27 09:35:47 | How Can I check PostgreSQL backup is successfully or not ? |
Previous Message | Sven R. Kunze | 2017-02-26 13:28:10 | Speeding up JSON + TSQUERY + GIN |