Re: Thousands of schemas and ANALYZE goes out of memory

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Hugo <hugo(dot)tech(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Thousands of schemas and ANALYZE goes out of memory
Date: 2012-10-03 00:09:14
Message-ID: CAMkU=1yZnAYvMHENt8=9pgwE8q5zmX+mG=SXbFHiLkq_qn0B7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Oct 2, 2012 at 10:38 AM, Hugo <Nabble> <hugo(dot)tech(at)gmail(dot)com> wrote:
>> That might be the problem. I think with 32 bits, you only 2GB of
>> address space available to any given process, and you just allowed
>> shared_buffers to grab all of it.
>
> The address space for 32 bits is 4Gb.

I had thought the highest bit was not usable, but maybe that was just
a Windows thing.

> We just tried to reach a balance in
> the configuration and it seems to be working (except for the ANALYZE command
> when the number of schemas/tables is huge).
>
> Some questions I have:
>
> 1) Is there any reason to run the ANALYZE command in a single transaction?

I don't know how the transactionality of analyze works. I was
surprised to find that I even could run it in an explicit transaction
block, I thought it would behave like vacuum and create index
concurrently in that regard.

However, I think that that would not solve your problem. When I run
analyze on each of 220,000 tiny tables by name within one session
(using autocommit, so each in a transaction), it does run about 4
times faster than just doing a database-wide vacuum which covers those
same tables. (Maybe this is the lock/resource manager issue that has
been fixed for 9.3?) But it takes the same amount of memory. It is
only by closing the connection periodically that I can reduce the peak
memory usage. So I think the memory is going to syscache, catcache,
and/or stats collector, which I think are non-transactional, live for
the duration of the backend, and have no way to evict least recently
used members once the caches get too large. Also, some parts of them
seem to have N^2 performance, albeit with a very low constant.

> 2) Is there any difference running the ANALYZE in the whole database or
> running it per schema, table by table?

I can't think of any important ones, unless there are some things you
forget to analyze that way. Does auto analyze have the same problem
as a manual analyze does? Probably not, unless your tables become
eligible simultaneously.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Angelico 2012-10-03 00:14:21 Re: Thousands of schemas and ANALYZE goes out of memory
Previous Message Royce Ausburn 2012-10-03 00:05:42 invalid page header in block...