Re: Storing database in cluster (Memory)

From: Richard Huxton <dev(at)archonet(dot)com>
To: roopa perumalraja <roopabenzer(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing database in cluster (Memory)
Date: 2007-02-08 08:43:45
Message-ID: 45CAE2C1.1050801@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

roopa perumalraja wrote:
>>> Hi all,
>>>
>>> As I am inserting 100million rows daily into partitioned tables
>>> (daily wise), it is getting slower.
>
>> What is - the inserts? By how much? What tables? What indexes? How
>> are you inserting these rows?
>
> I take my words back as 100million rows. The insert of 20million rows
> everyday takes only 10minutes as I use copy statement to copy into
> temperory table from flat files then do some manipulation to the data
> & insert it into the paritioned tables. I have solved the problem. I
> have paritioned the tables date-wise. (Is partitioning the tables
> monthly is recommanded?)

Depends upon your usage - some people partition daily if they have a lot
of incoming data.

> The proble is with another insert, selecting data from one of the
> paritioned tables and doing some calculations then inserting into
> another table. That is around 280000 rows every day. This takes
> really a long time (almost a day) if somebody is doing something with
> the database. If nobody is using the database then it takes almost
> two hours. Even a select statement to other tables in the datbase
> affects this insert. While inserting I use
>
> BEGIN WORK SELECT foo1 LOCK TABLE foo2 IN EXCLUSIVE MODE INSERT INTO
> foo2 (SELECT......) COMMIT WORK
>
> All the tables are indexed. I am using 4 indexes including the pkey
> index. Will dropping the index before inserting and reindexing it
> after the insert will help?

Maybe. It is often quicker to recreate an index if you are updating lots
of rows.

>>> Even the retrivel of data, select statement on those tables takes
>>> about 30 mintues.
>
>> All selects take 30 minutes, regardless what work they do? Or do
>> you have specific selects that are causing problems?
>
> Any statement more than one running on database takes time. Say for
> example if I am inserting 20million rows into one table & at the same
> time if I try updating into another table, that takes a lot of time.

You haven't given details of the select statements, or what the
bottleneck is in your system. It sounds like it might be disk I/O. Have
you tried monitoring activity with vmstat/iostat (assuming you're on a
Linux/Unix system)?

>>> I have tried increasing the parameters in postgres.conf but still
>>> that doesn't help me much as the no of rows are huge.
>
>> What parameters, to what values? What hardware are you running on?
>> What load do you place on it?
>
> parameters in postgres.conf which I increased are
>
> fsysn on
> wal_buffers 128
> checkpoint_segments 256 (2 GB)
> checkpoint_timeout 3600 (1 hour)

How did you calculate these? Is this based on your bulk data load.

> work_mem: set to 128MB
> maintenance_work_mem: to 512MB

I assume you have enough memory to support these settings? Something
over 4GB, yes?

> I wanted to increase shared_buffer to 60,000 but I am not able to
> restart the database if I change it even to 10000. It says 'could not
> start postmaster'. (max_connection is 100.)

"It" says? What do the logs say?

>>> Will the idea of storing the database in cluster (memory)
>>> increase the performance of insert/update/select in the table in
>>> a suitation like this? Thanks a lot in advance.
>
>> A better question would be "Why isn't my data being cached?". The
>> operating-system should cache regularly used files.
>
> So I think the problem is when two things are happening in database,
> it takes a long time. Thats the reason I was wondering if storing the
> database in cluster will solve the problem.

The data should be cached anyway. That's the point.

You still haven't told us any of the information needed to help you.
Some of the questions we still need answers for:
1. What queries are giving you trouble, and how many rows do they deal with?
2. What hardware is the database running on?
3. What load is on the system (try "vmstat 10" or similar)

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2007-02-08 08:47:23 Re: SQL textbook
Previous Message Malcolm McLean 2007-02-08 08:16:06 Persistent dead rows