Re: Memory Errors

From: Sam Nelson <samn(at)consistentstate(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Memory Errors
Date: 2010-09-08 20:03:38
Message-ID: AANLkTim3PnJdoh1K_cReRz3sRJjBgwUZmnR5QDOgZH8h@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

It figures I'd have an idea right after posting to the mailing list.

Yeah, running COPY foo TO stdout; gets me a list of data before erroring
out, so I did a copy (select * from foo order by id asc) to stdout; to see
if I could make some kind of guess as to whether this was related to a
single row or something else.

I got the id of the last row the copy to command was able to grab normally
and tried to figure out the next id. The following started to make me think
along the lines of some kinda bad corruption (even before getting responses
that agreed with that):

Assuming that the last id copied was 1500:

1) select * from foo where id = (select min(id) from foo where id > 1500);
Results in 0 rows

2) select min(id) from foo where id > 1500;
Results in, for example, 200000

3) select max(id) from foo where id > 1500;
Results in, for example, 90000 (a much lower number than returned by min)

4) select id from foo where id > 1500 order by id asc limit 10;
Results in (for example):

200000
202000
210273
220980
15005
15102
15104
15110
15111
15113

So ... yes, it seems that those four id's are somehow part of the problem.

They're on amazon EC2 boxes (yeah, we're not too fond of the EC2 boxes
either), so memtest isn't available, but no new corruption has cropped up
since they stopped killing the waiting queries (I just double checked - they
were getting corrupted rows constantly, and we haven't gotten one since that
script stopped killing queries).

We're going to have them attempt to delete the rows with those id's (even
though the rows don't exist) and if that fails, we're going to copy (select
* from foo where id not in (<list>)) to file;, drop table foo;, create table
foo;, and copy foo from file. I'll try to remember to write back with
whether or not any of those things worked.

On Wed, Sep 8, 2010 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Sam Nelson <samn(at)consistentstate(dot)com> writes:
> > pg_dump: Error message from server: ERROR: invalid memory alloc request
> > size 18446744073709551613
> > pg_dump: The command was: COPY public.foo (<columns>) TO stdout;
>
> > That seems like an incredibly large memory allocation request - it
> shouldn't
> > be possible for the table to really be that large, should it? Any idea
> what
> > may be wrong if it's actually trying to allocate that much memory for a
> copy
> > command?
>
> What that looks like is data corruption; specifically, a bogus length
> word for a variable-length field.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2010-09-08 20:15:11 Re: error while autovacuuming
Previous Message A.M. 2010-09-08 19:48:57 exclusion constraint with overlapping timestamps