Re: PostgreSQL performance problem -> tuning

From: Richard Huxton <dev(at)archonet(dot)com>
To: Yaroslav Mazurak <yamazurak(at)Lviv(dot)Bank(dot)Gov(dot)UA>, pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL performance problem -> tuning
Date: 2003-08-06 09:30:36
Message-ID: 200308061030.36610.dev@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wednesday 06 August 2003 08:34, Yaroslav Mazurak wrote:
> Hi All!
>
>
> I have installed PostgreSQL 7.3.2 on FreeBSD 4.7, running on PC with
> CPU Pentium II 400MHz and 384Mb RAM.

Version 7.3.4 is just out - probably worth upgrading as soon as it's
convenient.

> Problem is that SQL statement (see below) is running too long. With
> current WHERE clause 'SUBSTR(2, 2) IN ('NL', 'NM') return 25 records.
> With 1 record, SELECT time is about 50 minutes and takes approx. 120Mb
> RAM. With 25 records SELECT takes about 600Mb of memory and ends after
> about 10 hours with error: "Memory exhausted in AllocSetAlloc(32)".
[snip]
>
> Current postgresql.conf settings (some) are:
>
> === Cut ===
> max_connections = 8
>
> shared_buffers = 8192
> max_fsm_relations = 256
> max_fsm_pages = 65536
> max_locks_per_transaction = 16
> wal_buffers = 256
>
> sort_mem = 131072
This sort_mem value is *very* large - that's 131MB for *each sort* that gets
done. I'd suggest trying something in the range 1,000-10,000. What's probably
happening with the error above is that PG is allocating ridiculous amounts of
memory, the machines going into swap and everything eventually grinds to a
halt.

> vacuum_mem = 16384
>
> checkpoint_segments = 4
> checkpoint_timeout = 300
> commit_delay = 32000
> commit_siblings = 4
> fsync = false

I'd turn fsync back on - unless you don't mind losing your data after a crash.

> enable_seqscan = false

Don't tinker with these in a live system, they're only really for
testing/debugging.

> effective_cache_size = 65536

So you typically get about 256MB cache usage in top/free?

> === Cut ===
>
> SELECT statement is:
>
> SELECT showcalc('B00204', dd, r020, t071) AS s04
> FROM v_file02wide
> WHERE a011 = 3
> AND inrepdate(data)
> AND SUBSTR(ncks, 2, 2) IN ('NL', 'NM')
> AND r030 = 980;

Hmm - mostly views and function calls, OK - I'll read on.

> (cost=174200202474.99..174200202474.99 rows=1 width=143) -> Hash Join
^^^^^^^
This is a BIG cost estimate and you've got lots more like them. I'm guessing
it's because of the sort_mem / enable_seqscan settings you have. The numbers
don't make sense to me - it sounds like you've pushed the cost estimator into
a very strange corner.

> Function showcalc definition is:
>
> CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
> NUMERIC(16)) RETURNS NUMERIC(16)
> LANGUAGE SQL AS '
> -- Parameters: code, dd, r020, t071
> SELECT COALESCE(
> (SELECT sc.koef * $4
> FROM showing AS s NATURAL JOIN showcomp AS sc
> WHERE s.kod LIKE $1
> AND NOT SUBSTR(acc_mask, 1, 1) LIKE ''[''
> AND SUBSTR(acc_mask, 1, 4) LIKE $3
> AND SUBSTR(acc_mask, 5, 1) LIKE SUBSTR($2, 1, 1)),
Obviously, you could use = for these 3 rather than LIKE ^^^
Same below too.

> (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
> LENGTH(acc_mask) - 2), $2, $3, $4), 0))
> FROM showing AS s NATURAL JOIN showcomp AS sc
> WHERE s.kod LIKE $1
> AND SUBSTR(acc_mask, 1, 1) LIKE ''[''),
> 0) AS showing;
> ';
>
> View v_file02wide is:
>
> CREATE VIEW v_file02wide AS
> SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030,
> banx.box AS ncks, banx.nazva AS bank,
> epr.dd, r020, r030, a3, r030.nazva AS valuta, k041,
> -- Sum equivalent in national currency
> t071 * get_kurs(id_r030, data) AS t070,
> t071
> FROM v_file02 AS vf02
> JOIN kod_obl AS obl USING(id_obl)
> JOIN (dov_bank NATURAL JOIN dov_tvbv) AS banx
> ON banx.id_v = vf02.id_v
> AND data BETWEEN COALESCE(banx.dataa, data)
> AND COALESCE(banx.datab, data)
> JOIN ek_pok_r AS epr USING(id_dd)
> JOIN kl_r020 USING(id_r020)
> JOIN kl_r030 AS r030 USING(id_r030)
> JOIN kl_k041 USING(id_k041);
You might want to rewrite the view so it doesn't use explicit JOIN statements,
i.e FROM a,b WHERE a.id=b.ref rather than FROM a JOIN b ON id=ref
At the moment, this will force PG into making the joins in the order you write
them (I think this is changed in v7.4)

> Function inrepdate is:
>
> CREATE OR REPLACE FUNCTION inrepdate(DATE) RETURNS BOOL
> LANGUAGE SQL AS '
> -- Returns true if given date is in repdate
> SELECT (SELECT COUNT(*) FROM repdate
> WHERE $1 BETWEEN COALESCE(data1, CURRENT_DATE)
> AND COALESCE(data2, CURRENT_DATE))
>
> > 0;

You can probably replace this with:
SELECT true FROM repdate WHERE $1 ...
You'll need to look at where it's used though.

[snip table sizes]
> Table has indexes almost for all selected fields.

That's not going to help you for the SUBSTR(...) stuff, although you could use
functional indexes (see manuals/list archives for details).

First thing is to get those two configuration settings somewhere sane, then we
can tune properly. You might like the document at:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shridhar Daithankar 2003-08-06 10:16:06 Re: How Many Inserts Per Transactions
Previous Message Yaroslav Mazurak 2003-08-06 07:34:54 PostgreSQL performance problem -> tuning