Re: Segmentation fault with core dump

From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Joshua Berry <yoberi(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL - General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Segmentation fault with core dump
Date: 2013-06-11 14:53:52
Message-ID: 51B73A00.1030206@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

(2013/05/09 1:39), Joshua Berry wrote:
> | I'm using PG 9.1.9 with a client application using various versions of
> the
> | pgsqlODBC driver on Windows. Cursors are used heavily, as well as some
> pretty
> | heavy trigger queries on db writes which update several materialized
> views.
> |
> | The server has 48GB RAM installed, PG is configured for 12GB shared
> buffers,
> | 8MB max_stack_depth, 32MB temp_buffers, and 2MB work_mem. Most of the
> other
> | settings are defaults.
> |
> | The server will seg fault from every few days to up to two weeks. Each
> time
> | one of the postgres server processes seg faults, the server gets
> terminated by
> | signal 11, restarts in recovery for up to 30 seconds, after which time it
> | accepts connections as if nothing ever happened. Unfortunately all the
> open
> | cursors and connections are lost, so the client apps are left in a bad
> state.
> |
> | Seg faults have also occurred with PG 8.4. ... I migrated the database
> to a
> | server running PG9.1 with the hopes that the problem would disappear,
> but it
> | has not. So now I'm starting to debug.
> |
> | # uname -a
> | Linux [hostname] 2.6.32-358.2.1.el6.x86_64 #1 SMP Tue Mar 12 14:18:09
> CDT 2013
> | x86_64 x86_64 x86_64 GNU/Linux
> | # cat /etc/redhat-release
> | Scientific Linux release 6.3 (Carbon)
> |
> | # psql -U jberry
> | psql (9.1.9)
> | Type "help" for help.
> |
> | jberry=# select version();
> | version
> |
> -------------------------------------------------------------------------------
> | PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
> 4.4.7
> | 20120313 (Red Hat 4.4.7-3), 64-bit
> | (1 row)
>
> I've had another postmaster segfault on my production server. It appears
> to be the same failure as the last one nearly a month ago, but I wanted
> to post the gdb bt details in case it helps shed light on the issue.
> Please let me know if anyone would like to drill into the dumped core
> with greater detail. Both the OS and PG versions remain unchanged.
>
> Kind Regards,
> -Joshua
>
>
> On Fri, Apr 12, 2013 at 6:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com
> <mailto:andres(at)2ndquadrant(dot)com>> wrote:
>
> On 2013-04-10 19:06:12 -0400, Tom Lane wrote:
> > I wrote:
> > > (Wanders away wondering just how much the regression tests exercise
> > > holdable cursors.)
> >
> > And the answer is they're not testing this code path at all,
> because if
> > you do
> > DECLARE c CURSOR WITH HOLD FOR ...
> > FETCH ALL FROM c;
> > then the second query executes with a portal (and resource owner)
> > created to execute the FETCH command, not directly on the held
> portal.
> >
> > After a little bit of thought I'm not sure it's even possible to
> > reproduce this problem with libpq, because it doesn't expose any
> way to
> > issue a bare protocol Execute command against a pre-existing portal.
> > (I had thought psqlOBC went through libpq, but maybe it's playing
> some
> > games here.)
> >
> > Anyway, I'm thinking the appropriate fix might be like this
> >
> > - CurrentResourceOwner = portal->resowner;
> > + if (portal->resowner)
> > + CurrentResourceOwner = portal->resowner;
> >
> > in several places in pquery.c; that is, keep using
> > TopTransactionResourceOwner if the portal doesn't have its own.
> >
> > A more general but probably much more invasive solution would be
> to fake
> > up an intermediate portal when pulling data from a held portal, to
> > more closely approximate the explicit-FETCH case.
>
> We could also allocate a new resowner for the duration of that
> transaction. That would get reassigned to the transactions resowner in
> PreCommit_Portals (after a slight change there).
> That actually seems simple enough?

I made some changes to multi thread handling of psqlodbc driver.
It's also better to fix the crash at backend side.

I made 2 patches.
The 1st one temporarily changes CurrentResourceOwner to
CurTransactionResourceOwner during catalog cache handling.
The 2nd one allocates a new resource owner for held portals.
Both fix the crash in my test case.

regards,
Hiroshi Inoue

Attachment Content-Type Size
holdable_cursor_printtup.patch text/x-patch 1.4 KB
holdable_cursor_resowner.patch text/x-patch 5.7 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-06-11 15:03:53 Re: Segmentation fault with core dump
Previous Message Philipp Kraus 2013-06-11 11:42:31 Re: databse version