| From: | Craig James <craig_james(at)emolecules(dot)com> | 
|---|---|
| To: | pgsql-performance(at)postgresql(dot)org | 
| Subject: | Re: two memory-consuming postgres processes | 
| Date: | 2008-05-02 21:02:24 | 
| Message-ID: | 481B8160.2090106@emolecules.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov <deliverable(at)gmail(dot)com> wrote:
>  I naively thought that if I have a 100,000,000 row table, of the form
> (integer,integer,smallint,date), and add a real coumn to it, it will scroll
> through the memory reasonably fast.
In Postgres, an update is the same as a delete/insert. That means that changing the data in one column rewrites ALL of the columns for that row, and you end up with a table that's 50% dead space, which you then have to vacuum.
Sometimes if you have a "volatile" column that goes with several "static" columns, you're far better off to create a second table for the volatile data, duplicating the primary key in both tables. In your case, it would mean the difference between 10^8 inserts of (int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow).
The down side of this design is that later on, it requires a join to fetch all the data for each key.
You do have a primary key on your data, right? Or some sort of index?
Craig
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ernesto | 2008-05-02 21:07:58 | Very slow INFORMATION_SCHEMA | 
| Previous Message | Alexy Khrabrov | 2008-05-02 20:51:44 | Re: two memory-consuming postgres processes |