From: | Ben Campbell <ben(at)scumways(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | problems maintaining boolean columns in a large table |
Date: | 2010-02-09 11:25:33 |
Message-ID: | 4B71462D.3000209@scumways.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I've got a database that holds a bunch of articles in a table called
'article'. It has a bunch of columns, and each row might hold a few KB
of data, say.
I'm maintaining a separate fulltext database, and so I added a boolean
flag, 'needs_indexing' to my 'article' table to keep track of which
articles have been indexed (and I have some trigger functions on
'article' to automatically set the flag if the article is modified).
It all works fine.
Except when I want to rebuild my index from scratch. I need to set all
those flags, but it takes _ages_ to do "UPDATE article SET
needs_indexing=true;" (many hours at least - I've never let it run to
completion)
I _think_ the reason it takes so long is that postgresql doesn't modify
rows in place - it creates an entry for the modified row and zaps the
old one. So by touching _every_ row I'm basically forcing it to rebuild
my whole database... I've got about 2 million rows in 'articles'.
There are a few indexes on columns in 'articles' which obviously will
slow things down too.
I've had a minor attempt at tuning (increased checkpoint_segments) an d
I'm sure there are a bunch of other tricks I could use to bulk-set that
flag in much less time...
But my gut feeling is that the flag would be better off in it's own
table anyway, eg:
CREATE TABLE needs_indexing (
article_id integer references article(id)
);
So, if an article is listed in this table, it needs indexing.
(maybe with a constraint to ensure uniqueness - I only need articles
entered once in this table)
Does this sound like a reasonable way to go?
Any advice or insight welcome!
Thanks,
Ben.
From | Date | Subject | |
---|---|---|---|
Next Message | Boszormenyi Zoltan | 2010-02-09 11:27:05 | ERROR: could not load library "...": Exec format error |
Previous Message | Willy-Bas Loos | 2010-02-09 11:07:02 | dynamic plpgsql command on a record type |