Re: Large DB

From: Ericson Smith <eric(at)did-it(dot)com>
To: "Gavin M(dot) Roy" <gmr(at)ehpg(dot)net>
Cc: Bob(dot)Henkel(at)hartfordlife(dot)com, Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-general(at)postgresql(dot)org, pgsql-general-owner(at)postgresql(dot)org, "Mooney, Ryan" <ryan(dot)mooney(at)pnl(dot)gov>
Subject: Re: Large DB
Date: 2004-04-12 02:48:05
Message-ID: 407A0365.8080106@did-it.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

My point was that, if multi-minute query response times were ok, then
you can continue. But if the row sizes are really large, or you have
lots of text data, or there are tons of updates, and you want sub-second
response times -- then I see no option but for you to beef up hardware
dramatically, or indeed --- split the data into smaller tables that you
can still logically query.

That's what we were forced to do -- and trust me, we looked at our
schemas for over a year before deciding to do that. It paid off
hansomely. It's not such a horrible alternative as you might think.

- Ericson Smith

Gavin M. Roy wrote:

> I don't think that's what's being said, I run tables with tens of
> millions of rows without issue.
>
> Gavin
>
> Bob(dot)Henkel(at)hartfordlife(dot)com wrote:
>
>> I'm a fairly new Postgresql user. And a long time Oracle user so
>> keep that
>> in mind.
>>
>> So you are telling me that for 206238 rows on a table the best route
>> is to
>> break the table into separate tables? To me that is horrible to have
>> to do
>> unless you really wanted to do that for some reason. And unless the
>> count
>> is way higher then I read how is 206238 a large amout of rows on a
>> table?
>>
>>
>> |---------+---------------------------------->
>> | | Ericson Smith |
>> | | <eric(at)did-it(dot)com> |
>> | | Sent by: |
>> | | pgsql-general-owner(at)pos|
>> | | tgresql.org |
>> | | |
>> | | |
>> | | 04/06/2004 08:27 AM |
>> | | |
>> |---------+---------------------------------->
>> >------------------------------------------------------------------------------------------------------------------------------|
>>
>> |
>> |
>> | To: Manfred Koizar
>> <mkoi-pg(at)aon(dot)at>
>> |
>> | cc: "Mooney, Ryan" <ryan(dot)mooney(at)pnl(dot)gov>,
>> pgsql-general(at)postgresql(dot)org |
>> | Subject: Re: [GENERAL] Large
>> DB
>> |
>> >------------------------------------------------------------------------------------------------------------------------------|
>>
>>
>>
>>
>>
>> I've been following this thread with interest since it started, and it
>> really seems that there is just too much data in that single table. When
>> it comes down to it, making smaller separate tables seems to be the way
>> to go. Querying will be a little harder, but much faster.
>>
>> Warmest regards,
>> Ericson Smith
>> Tracking Specialist/DBA
>> +-----------------------+---------------------------------+
>> | http://www.did-it.com | "When you have to shoot, shoot, |
>> | eric(at)did-it(dot)com | don't talk! - Tuco |
>> | 516-255-0500 | |
>> +-----------------------+---------------------------------+
>>
>>
>>
>> Manfred Koizar wrote:
>>
>>
>>
>>> On Sat, 03 Apr 2004 22:39:31 -0800, "Mooney, Ryan"
>>> <ryan(dot)mooney(at)pnl(dot)gov>
>>> wrote:
>>>
>>>
>>>
>>>
>>>> Ok, so I ran a vacuum analyse. It took ~1.7 days to finish.
>>>>
>>>>
>>>>
>>>
>>> Just to make it clear: VACUUM and ANALYSE are two different commands.
>>>
>>> VACUUM is for cleaning up. It has to visit every tuple in every page,
>>> and if there are dead row versions, it also has to scan all indices
>>> belonging to the table. If there are lots of deleted rows and
>>> vacuum_mem is to small, VACUUM has to scan each index several times to
>>> delete all index entries pointing to dead rows. This might raise the
>>> cost to even more than O(tuplecount).
>>>
>>> ANALYSE collects a random sample of the rows in the table, the sample
>>> size depends on default_statistics_target and the maximum value you
>>> have
>>> set any column's statistics target to (ALTER TABLE ... ALTER COLUMN ...
>>> SET STATISTICS ...). If you didn't touch either, the sample size is
>>> 3000 rows. Then these 3000 rows are sorted and counted in different
>>> ways to generate statistics.
>>>
>>> The number of pages that have to be touched to collect the sample
>>> depends on the table size, but it does by far not grow
>>> proportionally to
>>> the number of pages, nblocks. The cost growth rate is greater than
>>> O(ln(nblocks)) and significantly lesser than O(nblocks). I have no
>>> simple formula for it, but I estimate that analysing your tp3 table
>>> would need between 28000 and 30000 page reads, which should be
>>> doable in
>>> a few minutes.
>>>
>>> VACUUM ANALYSE is just VACUUM followed by ANALYSE with the added
>>> benefit, that the number of rows does not have to be estimated by
>>> ANALYSE, because VACUUM knows the exact value.
>>>
>>>
>>>
>>>
>>>
>>>> The invalid page block was caused when I tried the 2.6 kernel
>>>> (for other reasons than DB performance), its been there for a while,
>>>> and I can deal w/ the data loss
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>> ERROR: invalid page header in block 10257032 of "tp3_point_starttime"
>>>>
>>>>
>>>>
>>>
>>> AFAICS the invalid page is in an index, so there is no data loss. You
>>> could simply drop and re-create that index. That might take some time,
>>> though :-(
>>>
>>>
>>>
>>>
>>>
>>>> Here is the explain analyse, you can see why I think that an index on
>>>> just host might be
>>>> better - hosts are a small set, starttime is a large set so the index
>>>> should be more
>>>> Efficient,
>>>>
>>>>
>>>>
>>>
>>> I think you got that backwards. If there are not many hosts, then an
>>> index on host is not very selective, IOW you get a lot of hits when you
>>> look for a particular host. OTOH if you select a sufficiently small
>>> starttime interval, you get only a few rows, so using an index is most
>>> efficient.
>>>
>>>
>>>
>>>
>>>
>>>> at the very least it should be (starttime, host), not (host,
>>>> starttime) unless
>>>> the indexing engine is smart enough to make that not matter (I suspect
>>>> its not???).
>>>>
>>>>
>>>>
>>>
>>> Yes, it should be (starttime, host). And no, PG is generally not smart
>>> enough to use an index if there is no condition on the first index
>>> column.
>>>
>>>
>>>
>>>
>>>
>>>> -> Index Scan using tp3_host_starttime, tp3_host_starttime,
>>>> [...], tp3_host_starttime on tp3
>>>> (cost=0.00..195.07 rows=1 width=25) (actual time=59.42..96366.43
>>>> rows=206238 loops=1)
>>>>
>>>>
>>>>
>>>
>>> Estimated number of rows: 1
>>> Actual number of rows: 206238
>>> The planner is way off here. Furtunately your huge number of rows
>>> makes
>>> it rule out every other (most probably slower) plan.
>>>
>>> How many hosts are there? Even if there are a few hundred, an index
>>> scan with that condition has to access and skip millions of index
>>> tuples. An index on (starttime, host) would visit less index tuples,
>>> and would more likely access the heap tuples in physical order.
>>>
>>>
>>>
>>>
>>>
>>>> Having followed the ongoing discusion about this I can concur that
>>>> it is
>>>> definitely NOT O(1). Unfortunately I didn't track the "time to
>>>> vacuum"
>>>>
>>>>
>>>>
>>>
>>> The discussion on -hackers and the patch I posted earlier today are
>>> about ANALYSE, not VACUUM.
>>>
>>>
>>>
>>>
>>>
>>>> However I believe that I'm going to follow the suggestions
>>>> about reducing the table size so I'll have a brand new BD to play with
>>>> in a couple weeks,
>>>>
>>>>
>>>>
>>>
>>> Hopefully we'll see a success story here.
>>>
>>>
>>>
>>>
>>>
>>>> so knowing what I know now, I can track that if anyones
>>>> interested in the data besides me :)
>>>>
>>>>
>>>>
>>>
>>> VACUUM and ANALYSE times? Certainly.
>>>
>>> Servus
>>> Manfred
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 3: if posting/reading through Usenet, please send an appropriate
>>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>>> message can get through to the mailing list cleanly
>>>
>>>
>>>
>>>
>>
>> (See attached file: eric.vcf)
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an index scan if
>> your
>> joining column's datatypes do not match
>>
>>
>>
>>
>> *************************************************************************
>>
>> PRIVILEGED AND CONFIDENTIAL: This communication, including
>> attachments, is for the exclusive use of addressee and may contain
>> proprietary, confidential and/or privileged information. If you are
>> not the intended recipient, any use, copying, disclosure,
>> dissemination or distribution is strictly prohibited. If you are not
>> the intended recipient, please notify the sender immediately by
>> return e-mail, delete this communication and destroy all copies.
>> *************************************************************************
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Clodoaldo Pinto Neto 2004-04-12 03:02:16 Re: ERROR: REINDEX DATABASE: Can be executed only on the currently open database.
Previous Message Bruce Momjian 2004-04-12 02:24:54 Re: pg_ctl written in c