Re: Optimization, etc

From: Jason Earl <jason(dot)earl(at)simplot(dot)com>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: Jeff Sack <sackj(at)alum(dot)rpi(dot)edu>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Optimization, etc
Date: 2001-11-09 18:22:06
Message-ID: 87itcjby9d.fsf@npa01zz001.simplot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:

> On Fri, 9 Nov 2001, Jeff Sack wrote:
>
> > To find the names of the single season home run leaders, along with the
> > total number of home runs, the team name/city and the year:
> >
> > select P.first_name, P.last_name, B.hr, T.name, T.city, S.year from
> > statistics S, batting_stats B, players P, teams T
> > where (S.id=B.id) and
> > (S.player_id=P.id) and
> > (B.hr>30) and
> > (T.id=S.team_id) limit 10;
> >
> > You get the idea. These queries take a while. Is this just the way it
> > is or there things that can be done to optimize this?
>
> As a starting point, have you run vacuum analyze and what does
> explain show for the query. Also, do you have indexes on fields that
> you're limiting on (like hr, etc...).
>
> > One separate issue (the reason why the above examples are all about
> > batting statistics) I'm having is representing the innings pitched
> > statistic. The way it is often represented (and the way it is done in
> > this schema) is something like this 123.0 means exactly 123 innings
> > pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings
> > pitched. I'm contemplating the best way to write a function that knows
> > how to sum these values accurately. Is this something that can be done
> > with PL/PGSQL or should I go straight to something like PLPERL?
> > Alternatively, I could research a way to represent fractions in the DB
> > and write a script to convert all values in this column. Any advice
> > here??
>
> You'd probably be best off doing the conversion at insert time into an
> additional field assuming that inserts are much less likely that
> selects on your data.
>
> (col-floor(col)*(10/3::numeric) seems to get back an appropriate value
> but is probably reasonably expensive.

Another thought would be to simply store this as an integer that is
the number of thirds of an inning that the pitcher pitched. In other
words your 123.1 would be stored as 370. That would allow you a very
easy way to manipulate these values mathematically, and it would be
easy to write a simple function to format these values so that your
uses would see the 123.1 that they expect.

Heck something like:

SELECT (370 / 3)::text || '.' || (370 % 3)::text;

would do exactly what you want.

Throw in some indexes like Josh and Stephen suggest and you'll be
cooking with gasoline.

You also might want to take a look at Bruce Momjian's article about
performance tuning:

http://www2.linuxjournal.com/lj-issues/issue88/4791.html

Jason

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message fstelpstra@yahoo.com 2001-11-09 21:30:42 Re: Design Tool for postgresql
Previous Message Jeff Eckermann 2001-11-09 16:35:42 Re: substring replacement