Quick Links

Re: which is more scalable for the database?

From:	"Timasmith" <timasmith(at)hotmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: which is more scalable for the database?
Date:	2007-03-13 03:39:10
Message-ID:	1173757150.216710.220780@30g2000cwc.googlegroups.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mar 8, 2:26 pm, p(dot)(dot)(dot)(at)Sheeky(dot)Biz (Shane Ambler) wrote:
> Timasmithwrote:
> > Suppose I have a database table with 20 fields which are lookups to a
> > single table.
>
> > configtable(configtable_id, a_field, something_lookup_id,
> > another_lookup_id, ...)
> > lookup(lookup_id, value, description, ...)
>
> > what is going to be faster to map the rows to an Object which needs
> > the 'value' for every field ending in lookup_id
>
> How long is ball of string 'a' compared to ball of string 'b'?
>
> That will depend on a lot on how large the lookup table will be (100
> rows or 10 million rows?)

100-500,000 rows

and on how long your descriptions are.

10-60, perhaps averaging around 20 characters

Will
> you want all 20 descriptions for each query you run?

Yes, it is the price I pay with Hibernate for vastly simplifying
data access

If your
> descriptions are 200 characters each then maybe you can reduce some data
> transfer by caching these at the client instead of transferring them
> each time you retrieve a result. But then how will you keep your local
> cache up to date?

The data I would cache would change infrequently in the production
environment and there would be
no expectation it would be 'immediate' on change.
I would provide services to flush on demand, or on save (of
reference data), and/or through a scheduler etc.

>
> I would suggest the view for a couple of reasons - first it will
> simplify your queries as you won't need all the join details in every
> query you run. Second the select for the query can be parsed and stored
> whereas separate selects will be parsed each time.

If I was caching on the client, I would select all data only once
and there would be no queries, it would only be
select * from sometable
and the client (actually server side bean), would populate the
object with the missing lookups.

>
> A lot of this will depend on what you are doing and what you are using
> to do it. If you are using php then you have less options for caching
> than say a client program written in c. php may run pretty fast but it
> won't outrun a compiled c program.

Java hashtable.

>
>
>
>
>
> > a) View
>
> > select c.configtable_id, l1.value as something_lookup, l2.value as
> > another_lookup
> > from configtable c,
> > lookup l1,
> > lookup l2
> > where c.something_lookup_id = l1.lookup_id
> > and c.another_lookup_id = l2.lookup_id
>
> > foreach row
> > map values to object
> > end
>
> > b) Cache all lookup values and populate
>
> > select c.* from configtable
>
> > foreach row
> > map values to object
> > if lookup_id
> > find value from hashtable and map value to object
> > endif
> > end
>
> > It seems that the latter *might* be better to scale outward better,
> > as
> > you could add application servers to do the caching/mapping and you
> > only select from a single table?
>
> Maybe but then postgresql may do the lookups quicker than what you have
> available at the client end.

Right, but could it keep up, how much extra effort is it to do the
indexed lookups (almost all in memory), for the client.

>
> The only way you will really know is to load up some sample data and
> test each method yourself.

Its very hard to simulate though due to production hardware having 16
cpus, Gigs of memory, huge databases, hundereds of concurrent users -
I just dont have access to that kind of environment.

>
> How far are you expecting to scale? The simplest method may just work
> fast enough that all the extra caching and setup/maintenance of this
> will outweigh any benefit.

I would like to scale to 10,000s of power users on a complex OLTP
system. Adding servers to expand out is generally feasible, but one
can only scale upward so far before reaching a hardware peak.

>
> --
>
> Shane Ambler
> p(dot)(dot)(dot)(at)Sheeky(dot)Biz
>
> Get Sheeky @http://Sheeky.Biz
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match- Hide quoted text -
>
> - Show quoted text -

In response to

Re: which is more scalable for the database? at 2007-03-08 19:26:40 from Shane Ambler

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2007-03-13 03:57:08	Re: Tracking disk writes? (again)
Previous Message	Alvaro Herrera	2007-03-13 03:30:27	Re: Tracking disk writes? (again)