Quick Links

Re: Large DB

From:	"Mooney, Ryan" <ryan(dot)mooney(at)pnl(dot)gov>
To:	Bob(dot)Henkel(at)hartfordlife(dot)com, Ericson Smith <eric(at)did-it(dot)com>
Cc:	Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-general-owner(at)postgresql(dot)org
Subject:	Re: Large DB
Date:	2004-04-06 22:57:09
Message-ID:	B295F3D09B0D3840BEA3B46C51FC389C1F5C0F@pnlmse28.pnl.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Actually in the case referenced I have over 1.4 billion rows. The
206238 row example
should be no problem. I had no (realistic based disk I/O bandwidth,
etc..)
performance issues whatsoever in the hundreds of millions of rows (even
with
sub-optimal indexes). My average query has to reference 2-30 Million
rows, so
yes that can take up to a few minutes, but even dd'ing that amount of
data to/from
the disk array took almost as long (that's what I used as my base
benchmark, if
my query approaches that I don't see any room to complain :>)

I could only really see 206K rows being a major issue IF they were
really large rows
AND you were doing really complex queries AND the indexes were really
poorly defined,
or you were on REALLY slow hardware. Unfortunately I didn't follow that
thread as
closely, so I don't remember exactly what the issues expressed there
were.

> -----Original Message-----
> From: Bob(dot)Henkel(at)hartfordlife(dot)com
> [mailto:Bob(dot)Henkel(at)hartfordlife(dot)com]
> Sent: Tuesday, April 06, 2004 2:17 PM
> To: Ericson Smith
> Cc: Manfred Koizar; pgsql-general(at)postgresql(dot)org;
> pgsql-general-owner(at)postgresql(dot)org; Mooney, Ryan
> Subject: Re: [GENERAL] Large DB
>
>
>
>
>
>
>
> I'm a fairly new Postgresql user. And a long time Oracle
> user so keep that in mind.
>
> So you are telling me that for 206238 rows on a table the
> best route is to break the table into separate tables? To me
> that is horrible to have to do unless you really wanted to do
> that for some reason. And unless the count is way higher
> then I read how is 206238 a large amout of rows on a table?
>
>
> |---------+---------------------------------->
> | | Ericson Smith |
> | | <eric(at)did-it(dot)com> |
> | | Sent by: |
> | | pgsql-general-owner(at)pos|
> | | tgresql.org |
> | | |
> | | |
> | | 04/06/2004 08:27 AM |
> | | |
> |---------+---------------------------------->
>
> >-------------------------------------------------------------
> -----------------------------------------------------------------|
> |
> |
> | To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
> |
> | cc: "Mooney, Ryan" <ryan(dot)mooney(at)pnl(dot)gov>,
> pgsql-general(at)postgresql(dot)org
> |
> | Subject: Re: [GENERAL] Large DB
> |
>
> >-------------------------------------------------------------
> -----------------------------------------------------------------|
>
>
>
>
> I've been following this thread with interest since it
> started, and it really seems that there is just too much data
> in that single table. When it comes down to it, making
> smaller separate tables seems to be the way to go. Querying
> will be a little harder, but much faster.
>
> Warmest regards,
> Ericson Smith
> Tracking Specialist/DBA
> +-----------------------+---------------------------------+
> | http://www.did-it.com | "When you have to shoot, shoot, |
> | eric(at)did-it(dot)com | don't talk! - Tuco |
> | 516-255-0500 | |
> +-----------------------+---------------------------------+
>
>
>
> Manfred Koizar wrote:
>
> >On Sat, 03 Apr 2004 22:39:31 -0800, "Mooney, Ryan"
> ><ryan(dot)mooney(at)pnl(dot)gov>
> >wrote:
> >
> >
> >>Ok, so I ran a vacuum analyse. It took ~1.7 days to finish.
> >>
> >>
> >
> >Just to make it clear: VACUUM and ANALYSE are two different
> commands.
> >
> >VACUUM is for cleaning up. It has to visit every tuple in
> every page,
> >and if there are dead row versions, it also has to scan all indices
> >belonging to the table. If there are lots of deleted rows and
> >vacuum_mem is to small, VACUUM has to scan each index
> several times to
> >delete all index entries pointing to dead rows. This might
> raise the
> >cost to even more than O(tuplecount).
> >
> >ANALYSE collects a random sample of the rows in the table,
> the sample
> >size depends on default_statistics_target and the maximum value you
> >have set any column's statistics target to (ALTER TABLE ... ALTER
> >COLUMN ... SET STATISTICS ...). If you didn't touch either,
> the sample
> >size is 3000 rows. Then these 3000 rows are sorted and counted in
> >different ways to generate statistics.
> >
> >The number of pages that have to be touched to collect the sample
> >depends on the table size, but it does by far not grow
> proportionally
> >to the number of pages, nblocks. The cost growth rate is
> greater than
> >O(ln(nblocks)) and significantly lesser than O(nblocks). I have no
> >simple formula for it, but I estimate that analysing your tp3 table
> >would need between 28000 and 30000 page reads, which should
> be doable
> >in a few minutes.
> >
> >VACUUM ANALYSE is just VACUUM followed by ANALYSE with the added
> >benefit, that the number of rows does not have to be estimated by
> >ANALYSE, because VACUUM knows the exact value.
> >
> >
> >
> >>The invalid page block was caused when I tried the 2.6 kernel (for
> >>other reasons than DB performance), its been there for a
> while, and I
> >>can deal w/ the data loss
> >>
> >>
> >
> >
> >
> >>ERROR: invalid page header in block 10257032 of
> "tp3_point_starttime"
> >>
> >>
> >
> >AFAICS the invalid page is in an index, so there is no data
> loss. You
> >could simply drop and re-create that index. That might take
> some time,
> >though :-(
> >
> >
> >
> >>Here is the explain analyse, you can see why I think that
> an index on
> >>just host might be better - hosts are a small set, starttime is a
> >>large set so the index should be more
> >>Efficient,
> >>
> >>
> >
> >I think you got that backwards. If there are not many
> hosts, then an
> >index on host is not very selective, IOW you get a lot of
> hits when you
> >look for a particular host. OTOH if you select a sufficiently small
> >starttime interval, you get only a few rows, so using an
> index is most
> >efficient.
> >
> >
> >
> >>at the very least it should be (starttime, host), not (host,
> >>starttime) unless
> >>the indexing engine is smart enough to make that not matter
> (I suspect
> >>its not???).
> >>
> >>
> >
> >Yes, it should be (starttime, host). And no, PG is
> generally not smart
> >enough to use an index if there is no condition on the first index
> >column.
> >
> >
> >
> >> -> Index Scan using tp3_host_starttime, tp3_host_starttime,
> >> [...], tp3_host_starttime on tp3
> >>(cost=0.00..195.07 rows=1 width=25) (actual time=59.42..96366.43
> >>rows=206238 loops=1)
> >>
> >>
> >
> >Estimated number of rows: 1
> >Actual number of rows: 206238
> >The planner is way off here. Furtunately your huge number of rows
> >makes it rule out every other (most probably slower) plan.
> >
> >How many hosts are there? Even if there are a few hundred, an index
> >scan with that condition has to access and skip millions of index
> >tuples. An index on (starttime, host) would visit less
> index tuples,
> >and would more likely access the heap tuples in physical order.
> >
> >
> >
> >>Having followed the ongoing discusion about this I can
> concur that it
> >>is definitely NOT O(1). Unfortunately I didn't track the "time to
> >>vacuum"
> >>
> >>
> >
> >The discussion on -hackers and the patch I posted earlier today are
> >about ANALYSE, not VACUUM.
> >
> >
> >
> >>However I believe that I'm going to follow the suggestions about
> >>reducing the table size so I'll have a brand new BD to play
> with in a
> >>couple weeks,
> >>
> >>
> >
> >Hopefully we'll see a success story here.
> >
> >
> >
> >>so knowing what I know now, I can track that if anyones
> interested in
> >>the data besides me :)
> >>
> >>
> >
> >VACUUM and ANALYSE times? Certainly.
> >
> >Servus
> > Manfred
> >
> >---------------------------(end of
> >broadcast)---------------------------
> >TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org
> so that your
> > message can get through to the mailing list cleanly
> >
> >
> >
> (See attached file: eric.vcf)
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index
> scan if your
> joining column's datatypes do not match
>
>
>
>
> **************************************************************
> ***********
> PRIVILEGED AND CONFIDENTIAL: This communication, including
> attachments, is for the exclusive use of addressee and may
> contain proprietary, confidential and/or privileged
> information. If you are not the intended recipient, any use,
> copying, disclosure, dissemination or distribution is
> strictly prohibited. If you are not the intended recipient,
> please notify the sender immediately by return e-mail, delete
> this communication and destroy all copies.
> **************************************************************
> ***********
>
>

Browse pgsql-general by date

	From	Date	Subject
Next Message	Gavin M. Roy	2004-04-06 23:08:25	Re: Large DB
Previous Message	Jeff Eckermann	2004-04-06 22:51:56	Re: Creating a trigger function