From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: GIN improvements part 1: additional information |
Date: | 2013-07-06 16:42:58 |
Message-ID: | 51D84912.2080000@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I've done a fair amount of testing by loading pgsql-general archives
into a database and running a bunch of simple ts queries that use a GIN
index.
I've tested this as well as the two other patches, but as I was able to
get meaningful results only from this patch, I'll post the results here
and info about segfaults and other observed errors to the other threads.
First of all - update the commitfest page whenever you submit a new
patch version, please. I've spent two or three hours testing and
debugging a patches linked from those pages only to find out that there
are newer versions. I should have checked that initially, but let's keep
that updated.
I wan't able to apply the patches to the current head, so I've used
b8fd1a09 (from 17/06) as a base commit.
The following table shows these metrics:
* data load
- how long it took to import ~200k messages from the list archive
- includes a lot of time spent in Python (parsing), checking FKs ...
- so unless this is significantly higher, it's probably OK
* index size
- size of the main GIN index on message body
* 1/2/3-word(s)
- number of queries in the form
SELECT id FROM messages
WHERE body_tsvector @@ plainto_tsquery('english', 'w1 w2')
LIMIT 100
(executed over 60 seconds, and 'per second' speed)
All the scripts are available at https://bitbucket.org/tvondra/archie
Now, the results:
no patches:
data load: 710 s
index size: 545 MB
1 word: 37500 (630/s)
2 words: 49800 (800/s)
3 words: 40000 (660/s)
additional info (ginaddinfo.7.patch):
data load: 693 s
index size: 448 MB
1 word: 135000 (2250/s)
2 words: 85000 (1430/s)
3 words: 54000 ( 900/s)
additional info + fast scan (gin_fast_scan.4.patch):
data load: 720 s
index size: 455 MB
1 word: FAIL
2 words: FAIL
3 words: FAIL
additional info + fast scan + ordering (gin_ordering.4.patch):
data load: FAIL
index size: N/A
1 word: N/A
2 words: N/A
3 words: N/A
So the speedup after adding info into GIN seems very promising, although
I don't quite understand why searching for two words is so much slower.
Also the index size seems to decrease significantly.
After applying 'fast scan' the things started to break down, so I wasn't
able to run the queries and then even the load failed consistently.
I'll post the info into the appropriate threads.
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2013-07-06 16:48:00 | Re: GIN improvements part2: fast scan |
Previous Message | Kevin Grittner | 2013-07-06 16:20:13 | Re: refresh materialized view concurrently |