From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Cc: | Stephan Vollmer <svollmer(at)gmx(dot)de>, pgsql-performance(at)postgreSQL(dot)org |
Subject: | Re: [GENERAL] Creation of tsearch2 index is very slow |
Date: | 2006-01-20 21:50:17 |
Message-ID: | 27264.1137793817@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-performance |
Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> Given that all it's doing is counting bits, a simple fix would be to
> loop over bytes, use XOR and count ones. For extreme speedup create a
> lookup table with 256 entries to give you the answer straight away...
Yeah, I just finished doing that and got about a 20x overall speedup
(30-some seconds to build the index instead of 10 minutes). However,
hemdistsign is *still* 70% of the runtime after doing that. The problem
seems to be that gtsvector_picksplit is calling it an inordinate number
of times:
0.53 30.02 1649/1649 FunctionCall2 <cycle 2> [19]
[20] 52.4 0.53 30.02 1649 gtsvector_picksplit [20]
29.74 0.00 23519673/33035195 hemdistsign [18]
0.14 0.00 22985469/22985469 hemdistcache [50]
0.12 0.00 268480/10030581 makesign [25]
0.02 0.00 270400/270400 fillcache [85]
0.00 0.00 9894/4077032 AllocSetAlloc [34]
0.00 0.00 9894/2787477 MemoryContextAlloc [69]
(hemdistcache calls hemdistsign --- I think gprof is doing something
funny with tail-calls here, and showing hemdistsign as directly called
from gtsvector_picksplit when control really arrives through hemdistcache.)
The bulk of the problem is clearly in this loop, which performs O(N^2)
comparisons to find the two entries that are furthest apart in hemdist
terms:
for (k = FirstOffsetNumber; k < maxoff; k = OffsetNumberNext(k))
{
for (j = OffsetNumberNext(k); j <= maxoff; j = OffsetNumberNext(j))
{
if (k == FirstOffsetNumber)
fillcache(&cache[j], GETENTRY(entryvec, j));
size_waste = hemdistcache(&(cache[j]), &(cache[k]));
if (size_waste > waste)
{
waste = size_waste;
seed_1 = k;
seed_2 = j;
}
}
}
I wonder if there is a way to improve on that.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2006-01-20 22:00:40 | Re: Page-Level Encryption |
Previous Message | Rick Gigger | 2006-01-20 21:49:12 | panic on 7.3 |
From | Date | Subject | |
---|---|---|---|
Next Message | Jignesh K. Shah | 2006-01-20 22:13:58 | Re: Sudden slowdown of Pg server |
Previous Message | Steinar H. Gunderson | 2006-01-20 21:44:02 | Re: [GENERAL] Creation of tsearch2 index is very slow |