From: | Lynn(dot)Tilby(at)asu(dot)edu |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Christian Fowler <google(at)NOSPAM(dot)gravesweeper(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Making a tree with "millions and millions" of dynamic |
Date: | 2003-12-07 23:57:50 |
Message-ID: | 1070841470.3fd3be7e2e2e4@webmail.asu.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Oleg,
I worked for a company where we crunched 16+ terabytes of
multiple drug company sales data per month. We of course
used Sun's BIGGEST boxes and Oracle. But! the secret
to doing it was LOTS of memory and putting the indexes,
pointers, etc. in arrays and using Oracle OCI calls.
If it is possible to use some lower level funcs, perhaps
funcs like you see in the .c output of ecpg this might
theoretically be possible in postgres. I know that
putting the index references or functionally similar data
in arrays, and doing the data location resolution to
the logical or even physical disk location is not only
extreamly FAST but possible, having done it on raw
partitions and optical disks myself in custom designed
data bases. This approach would work extreamly fast
with only 20 million rows. Yes, it would require some
extra programming but if you need REAL TIME response
this approach could help solve your problem.
Best Regards,
Lynn P. Tilby
Unix Consultant
ltilby(at)asu(dot)edu
Ph: 480 632-8635
Quoting Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>:
> Christian,
>
> you may try our contrib/ltree module
> see http://www.sai.msu.su/~megera/postgres/gist/ltree
> With tuned postgresql and reasonable hardware you might get what
> you're
> looking for.
>
> Oleg
>
> On Tue, 2 Dec 2003, Christian Fowler wrote:
>
> >
> >
> > I have a VERY LARGE pile of geographic data that I am importing into a
> database (db of choice is postgres,
> > though may hop to oracle if necessary). The data is strictly
> hierarchical - each node has one, and only one
> > parent. The depth should not exceed 6 or 7 levels. The initial import
> will have about 6 million leaves, and
> > 3 million branches. I would expect the leaves to grow significantly,
> in number easily tripling. However, the
> > branches will likely stay very constant in number, but I expect there
> locations to shift around somewhat
> > (affecting up to thousand of children).
> >
> > For selection, it is crucial for me to get:
> >
> > 1. path generation speed
> > 2. immediate sibling speed
> > 3. children count speed
> >
> >
> > I have implemented a first run using Joe Celko's Nested Sets (w/ a mod
> to store tree level for speed). The
> > update propigation issue is the achilles heel of this approach for me.
> I have read Vadim Tropashko Nested
> > Interval concept ( http://www.dbazine.com/tropashko4.html ) , and my
> brain is painfully stretched enough to
> > get the general idea. I have a feeling i will hit the arithmetic
> issues others of reported.
> >
> > So it seems Materialized Path is my only option, however I am
> concerned about LIKE performance for the right
> > hand side of the tree, where the path is 8digits x 6 levels = 48
> chars. Should I be concerned? I need
> > split-second real-time performance, and can't imagine it will be as
> fast the Nested Set arithmatic approach.
> > I can flatten out the import to insure the upper tree has the smallest
> numbers, however, it will save at
> > most 8 chars on the path.
> >
> > I've been googling through USENET archives watching the big debates of
> years gone by. I've grazed much
> > knowledge, but now am curious if anyone has any thoughts/advice/war
> stories about a data set this large.
> >
> > (and if any fellow postgres fans have some reassuring words about
> real-time performance of a 20 million row
> > tree, i'd love to hear ;-)
> >
> >
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
From | Date | Subject | |
---|---|---|---|
Next Message | cadancer | 2003-12-08 00:30:30 | Help with Turning debugging on/off |
Previous Message | George Essig | 2003-12-07 23:40:35 | CMS with PostgreSQL |