Splitting Postgres into Separate Clusters?

From: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Splitting Postgres into Separate Clusters?
Date: 2013-03-09 19:54:24
Message-ID: CA+6hpamSK-11vpPXH9QwYzOpNfNvo5iEc7UR=SOWGZCKxK9Naw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I'm running a specialized search engine that indexes a few tens of millions
of web pages, keeping everything in Postgres, and one problem I'm starting
to see is poor cache hit rates. My database has two or three tables just
for the text of the scraped pages, with one row every time a page is
scraped and a `text` column for the HTML. These tables are
almost-but-not-quite write-only. They are only read by one job, which uses
them to create other objects in the system. I'd like the rest of the
database to be in-memory all the time, but I don't really care if I have to
read these tables from disk. To keep my problem tables from dragging down
performance on the rest of the system, I'm thinking of splitting them out.
I suppose I could save the HTML outside of Postgres entirely, but I'm
wondering if a nicer solution would be to keep these tables in a separate
"cluster" (i.e. /var/lib/postgresql/9.1/something_else -- I hope this is
the right terminology). Then I could tune that cluster differently from the
main cluster, or even put it on a different machine. And I could still use
dblink to query both clusters conveniently (I think; this isn't really that
important). Does this seem like a worthwhile approach? Is there a better
way to deal with a few out-sized tables?

Thanks,
Paul

--
_________________________________
Pulchritudo splendor veritatis.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paul Jungwirth 2013-03-09 21:20:27 Avoiding a deadlock
Previous Message Bèrto ëd Sèra 2013-03-09 18:08:21 Re: table spaces