Quick Links

Re: [PERFORM] Need help in setting optimal configuration for a huge database.

From:	Kishore B <kishorebh(at)gmail(dot)com>
To:	"pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject:	Re: [PERFORM] Need help in setting optimal configuration for a huge database.
Date:	2005-10-23 21:55:50
Message-ID:	42567e060510231455l6a6721cdtc52ad1b4d254aa64@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin pgsql-performance

Hi Craig,
Does the no of tables and the size of each table affect the performance of
a join operation?
When we are trying to join the two big tables that I described above, pg is
taking so long to execute?
Thank you,
Kishore.

On 10/23/05, Craig A. James <cjames(at)modgraph-usa(dot)com> wrote:
>
>
> > We are facing a* critical situation because of the performance of the
> > **database** .* Even a basic query like select count(*) from
> > bigger_table is taking about 4 minutes to return.
>
> Several other replies have mentioned that COUNT() requires a full table
> scan, but this point can't be emphasized enough: Don't do it! People who are
> migrating from other environments (Oracle or MySQL) are used to COUNT(),
> MIN() and MAX() returning almost instantaneously, certainly on indexed
> columns. But for reasons that have something to do with transactions, these
> operations are unbelievably slow in PostgreSQL.
>
> Here are the alternatives that I've learned.
>
> COUNT() -- There is no good substitute. What I do is create a new column,
> "ROW_NUM" with an auto-incrementing sequence. Every time I insert a row, it
> gets a new value. Unfortunately, this doesn't work if you ever delete a row.
> The alternative is a more complex pair of triggers, one for insert and one
> for delete, that maintains the count in a separate one-row table. It's a
> nuisance, but it's a lot faster than doing a full table scan for every
> COUNT().
>
> MIN() and MAX() -- These are surprisingly slow, because they seem to do a
> full table scan EVEN ON AN INDEXED COLUMN! I don't understand why, but
> happily there is an effective substitute:
>
> select mycolumn from mytable order by mycolumn limit 1; -- same as MIN()
>
> select mycolumn from mytable order by mycolumn desc limit 1; -- same as
> MAX()
>
> For a large table, MIN or MAX can take 5-10 minutes, where the above
> "select..." replacements can return in one millisecond.
>
> You should carefully examine your entire application for COUNT, MIN, and
> MAX, and get rid of them EVERYWHERE. This may be the entire source of your
> problem. It was in my case.
>
> This is, in my humble opinion, the only serious flaw in PostgreSQL. I've
> been totally happy with it in every other way, and once I understood these
> shortcomings, my application is runs faster than ever on PostgreSQL.
>
> Craig
>

In response to

Re: Need help in setting optimal configuration for a huge at 2005-10-23 16:31:44 from Craig A. James

Responses

Re: [PERFORM] Need help in setting optimal configuration at 2005-10-24 15:28:02 from Scott Marlowe

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Tsirkin Evgeny	2005-10-24 14:23:07	Re: Fwd: Question: 2 GB file limit on linux
Previous Message	Kishore B	2005-10-23 21:35:38	Re: Need help in setting optimal configuration for a huge database.

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Guillaume Smet	2005-10-23 23:53:59	Problem analyzing explain analyze output
Previous Message	Kishore B	2005-10-23 21:35:38	Re: Need help in setting optimal configuration for a huge database.