Re: two memory-consuming postgres processes

From: PFC <lists(at)peufeu(dot)com>
To: "Alexy Khrabrov" <deliverable(at)gmail(dot)com>, "Craig James" <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: two memory-consuming postgres processes
Date: 2008-05-02 21:30:44
Message-ID: op.uajp1iotcigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> I created several indices for the primary table, yes. Sure I can do a
> table for a volatile column, but then I'll have to create a new such
> table for each derived column -- that's why I tried to add a column to
> the existing table. Yet seeing this is really slow, and I need to to
> many derived analyses like this -- which are later scanned in other
> computations, so should persist -- I indeed see no other way but to
> procreate derived tables with the same key, one column per each...

OK, so in that case, if you could do all of your derived column
calculations in one query like this :

CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived
calculations here)

or :

BEGIN; <-- this is important to avoid writing xlog
CREATE TABLE derived AS ...
INSERT INTO derived SELECT ... FROM ... (perform all your derived
calculations here)
COMMIT;

Basically, updating the entire table several times to add a few simple
columns is a bad idea. If you can compute all the data you need in one
query, like above, it will be much faster. Especially if you join one
large table to several smaller ones, and as long as the huge data set
doesn't need to be sorted (check the query plan using EXPLAIN). Try to do
as much as possible in one query to scan the large dataset only once.

Note that the above will be faster than updating the entire table since
it needs to write much less data : it doesn't need to delete the old rows,
and it doesn't need to write the transaction log, since if the transaction
rolls back, the table never existed anyway. Also since your newly created
table doesn't have any indexes, they won't need to be updated.

If you really need to update an entire table multiple times, you will
need to :

- Use hardware that can handle disk writes at a decent speed (that isn't
a characteristic of a laptop drive)
- use MyIsam, yes (but if you need to make complex queries on the data
afterwards, it could suck).

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2008-05-02 21:43:58 Re: two memory-consuming postgres processes
Previous Message Alexy Khrabrov 2008-05-02 21:30:17 Re: two memory-consuming postgres processes