Re: benchmarking the query planner

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-11 18:09:28
Message-ID: 9565.1229018968@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
> Ah, that makes sense. Here's a test case based on Greg's. This is
> definitely more than linear once you get above about n = 80, but it's
> not quadratic either. n = 1000 is only 43x n = 80, and while that's
> surely more than 1000/80 = 12.5, it's also a lot less than (1000/80)^2
> = 156.25.

Yeah, that's not too surprising. There's a linear cost associated with
fetching/deconstructing the stats array, and a quadratic cost in the
comparison loop. What these results say is that the upper limit of 1000
keeps you from getting into the range where the linear component is
completely negligible. If you plot the results and try to fit a curve
like c1*x^2 + c2*x to them, you get a very good fit for all the points
above about 80. Below that, the curve has a flatter slope, indicating
a smaller linear cost component. The values in this test are about 100
bytes each, so the breakpoint at 80 seems to correspond to what happens
when the stats array no longer fits in one page.

I replicated your test and got timings like these, using CVS HEAD with
cassert off:

10 1.587
20 1.997
30 2.208
40 2.499
50 2.734
60 3.048
70 3.368
80 3.706
90 4.686
100 6.418
150 10.016
200 13.598
250 17.905
300 22.777
400 33.471
500 46.394
600 61.185
700 77.664
800 96.304
900 116.615
1000 140.117

So this machine is a bit slower than yours, but otherwise it's pretty
consistent with your numbers. I then modified the test to use

array[random()::text,random()::text,random()::text,random()::text,random()::text,random()::text]

ie, the same data except stuffed into an array. I did this because
I know that array_eq is pretty durn expensive, and indeed:

10 1.662
20 2.478
30 3.119
40 3.885
50 4.636
60 5.437
70 6.403
80 7.427
90 8.473
100 9.597
150 16.542
200 24.919
250 35.225
300 47.423
400 76.509
500 114.076
600 157.535
700 211.189
800 269.731
900 335.427
1000 409.638

When looking at these numbers one might think the threshold of pain
is about 50, rather than 100 which is where I'd put it for the text
example. However, this is probably an extreme worst case.

On the whole I think we have some evidence here to say that upping the
default value of default_stats_target to 100 wouldn't be out of line,
but 1000 definitely is. Comments?

BTW, does anyone have an opinion about changing the upper limit for
default_stats_target to, say, 10000? These tests suggest that you
wouldn't want such a value for a column used as a join key, but
I can see a possible argument for high values in text search and
similar applications.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-12-11 18:32:25 Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)
Previous Message Peter Eisentraut 2008-12-11 17:43:16 Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)