pgsql: Reduce memory usage of tsvector type analyze function.

From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Reduce memory usage of tsvector type analyze function.
Date: 2017-07-12 19:06:59
Message-ID: E1dVMyR-0000it-EO@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Reduce memory usage of tsvector type analyze function.

compute_tsvector_stats() detoasted and kept in memory every tsvector value
in the sample, but that can be a lot of memory. The original bug report
described a case using over 10 gigabytes, with statistics target of 10000
(the maximum).

To fix, allocate a separate copy of just the lexemes that we keep around,
and free the detoasted tsvector values as we go. This adds some palloc/pfree
overhead, when you have a lot of distinct lexemes in the sample, but it's
better than running out of memory.

Fixes bug #14654 reported by James C. Reviewed by Tom Lane. Backport to
all supported versions.

Discussion: https://www.postgresql.org/message-id/20170514200602.1451.46797@wrigleys.postgresql.org

Branch
------
REL9_5_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/209970ded835ffe2f354220da77f3df9a7a7dab4

Modified Files
--------------
src/backend/tsearch/ts_typanalyze.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2017-07-12 22:00:25 pgsql: Fix ruleutils.c for domain-over-array cases, too.
Previous Message Alvaro Herrera 2017-07-12 18:41:24 pgsql: commit_ts test: Set node name in test