Re: Limitations on 7.0.3?

From: Richard Huxton <dev(at)archonet(dot)com>
To: ARTEAGA Jose <Jose(dot)Arteaga(at)alcatel-lucent(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: Limitations on 7.0.3?
Date: 2007-07-17 07:03:26
Message-ID: 469C69BE.5020109@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

ARTEAGA Jose wrote:
> I have spent the last month battling and looking deeper into the issue,
> here's a summary of were I'm at:
> - Increasing shared buffers improved performance but did not resolve the
> backend FATAL disconnect error.
> - Dumping and recreating entire database also did not resolve the issue.

OK, so it's not a corrupted index/file then.

> - re-initializing the DB and recreating from the dump also did not
> resolve the issue.
> On both cases above the issue re-occurred within 2-3 days of run-time
> (insert of new records).
>
> I got the issue narrowed down to the point were I was able to re-create
> the issue at will by just inserting enough data, the data content did
> not matter. The issue always occurred while inserting into my
> "teststeprun" table, which is the largest of my tables (~15 Mill rows).
> The issue is that once I got this table to a certain size, then the
> backend system would crash.
>
> Since I was able to reproduce, I then decided to analyze the core dumps.
> Looking at the core dumps I immediately began to see a pattern, even the
> same patter was there from the initial core dumps I had when the problem
> began occurring back two months ago. In every case the dump indicated
> the last instruction was always in the call to tag_hash(). I also
> noticed that each time the values passed to tag_hash which are used to
> generate the key were just below the 32-bit max value, and tag_hash
> should be returning a uint32 value. Now I'm really suspecting that there
> is some issue with this. Below are the traces of the four core dumps
> which point to the issue I'm suspecting.

I think tag_hash (in /backend/utils/hash/hashfn.c) is responsible for
internal hash-tables (rather than hash indexes). It takes a pointer to a
key to hash and a keysize (in bytes), so either the pointer is bad or
the size is too long and it's reading off the end.

At the other end of your call, _bt_insertonpg
(/backend/access/nbtree/nbtinsert.c) is inserting into a btree index. In
one case it's splitting the index page it tries to insert into (because
it's full) but not in the others.

If it's not a hardware related problem, then it's a bug, but you're
unlikely to get a fix given how old the code is. If an upgrade to 8.2
looks like it will take a lot of effort, perhaps consider an
intermediate upgrade to 7.2 - I think schemas were introduced in 7.3 so
before that should be easier.

There is a chance that you might reduce the problem by REINDEXing the
table concerned every night. That's just a guess though, and you're real
solution will be to upgrade to something more recent.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vince 2007-07-17 07:13:08 PHP pg_connect
Previous Message Jason Nerothin 2007-07-17 07:03:07 interaction with postgres defined types in custom c functions