Quick Links

Re: Database corruption in 7.0.3

From:	Denis Perchine <dyp(at)perchine(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Database corruption in 7.0.3
Date:	2001-03-15 09:50:53
Message-ID:	200103150952.PAA27478@technoart.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Can confirm this. Get this just yesterday time ago...

Messages:

NOTICE: Rel acm: TID 1697/217: OID IS INVALID. TUPGONE 1.

And lots of such lines...
And

pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.

In the end :-((( I lost a library of our institute... :-((( But I have a
backup!!! :-)))) This table even have NO indices!!!

Program received signal SIGSEGV, Segmentation fault.
0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311
311 alignedSize = MAXALIGN((*lp).lp_len);
(gdb) bt
#0 0x813837f in PageRepairFragmentation (page=0x82840b0 "") at bufpage.c:311
#1 0x80a9b07 in vc_scanheap (vacrelstats=0x82675b0, onerel=0x8273428,
vacuum_pages=0xbfffe928, fraged_pages=0xbfffe918) at vacuum.c:1022
#2 0x80a8e8b in vc_vacone (relid=27296, analyze=0 '\000', va_cols=0x0) at
vacuum.c:599
#3 0x80a8217 in vc_vacuum (VacRelP=0xbfffe9b4, analyze=0 '\000',
va_cols=0x0) at vacuum.c:299
#4 0x80a818b in vacuum (vacrel=0x8267400 "", verbose=1 '\001', analyze=0
'\000', va_spec=0x0) at vacuum.c:223
#5 0x813fba5 in ProcessUtility (parsetree=0x8267418, dest=Remote) at
utility.c:694
#6 0x813c16e in pg_exec_query_dest (query_string=0x820aaa0 "vacuum verbose
acm;", dest=Remote, aclOverride=0 '\000') at postgres.c:617
#7 0x813c08e in pg_exec_query (query_string=0x820aaa0 "vacuum verbose acm;")
at postgres.c:562
#8 0x813d4c3 in PostgresMain (argc=9, argv=0xbffff068, real_argc=9,
real_argv=0xbffffa3c) at postgres.c:1588
#9 0x811ace5 in DoBackend (port=0x8223068) at postmaster.c:2009
#10 0x811a639 in BackendStartup (port=0x8223068) at postmaster.c:1776
#11 0x811932f in ServerLoop () at postmaster.c:1037
#12 0x8118b0e in PostmasterMain (argc=9, argv=0xbffffa3c) at postmaster.c:725
#13 0x80d5e5e in main (argc=9, argv=0xbffffa3c) at main.c:93
#14 0x40111fee in __libc_start_main () from /lib/libc.so.6

This is plain 7.0.3.

On Thursday 15 March 2001 14:52, Tim Allen wrote:
> We have an application that we were running quite happily using pg6.5.3
> in various customer sites. Now we are about to roll out a new version of
> our application, and we are going to use pg7.0.3. However, in testing
> we've come across a couple of isolated incidents of database
> corruption. They are sufficiently rare that I can't reproduce the problem,
> nor can I put my finger on just what application behaviour causes the
> problems.
>
> The symptoms most often involve some sort of index corruption, which is
> reported by vacuum and it seems that vacuum can fix it. On occasion vacuum
> reports "invalid OID" or similar (sorry, don't have exact wording of
> message). On one occasion the database has been corrupted to the point of
> unusability (ie vacuum admitted that it couldn't fix the problem), and a
> dump/restore was required (thankfully that at least worked). The index
> corruption also occasionally manifests itself in the form of spurious
> uniqueness constraint violation errors.
>
> The previous version of our app using 6.5.3 has never shown the slightest
> symptom of database misbehaviour, to the best of my knowledge, despite
> fairly extensive use. So our expectations are fairly high :-).
>
> One thing that is different about the new version of our app is that we
> now use multiple connections to the database (previously we only had
> one). We can in practice have transactions in progress on several
> connections at once, and it is possible for some transactions to be rolled
> back under application control (ie explicit ROLLBACK; statement).
>
> I realise I haven't really provided an awful lot of information that would
> help identify the problem, so I shall attempt to be understanding if
> no-one can offer any useful suggestions. But I hope someone can :-). Has
> anyone seen this sort of problem before? Are there any known
> database-corrupting bugs in 7.0.3? I don't recall anyone mentioning any in
> the mailing lists. Is using multiple connections likely to stimulate any
> known areas of risk?
>
> BTW we are using plain vanilla SQL, no triggers, no new types defined, no
> functions, no referential integrity checks, nothing more ambitious than a
> multi-column primary key.
>
> The platform is x86 Red Hat Linux 6.2. Curiously enough, on one of our
> testing boxes and on my development box we have never seen this, but we
> have seen it several times on our other test box and at least one customer
> site, so there is some possibility it's related to dodgy hardware. The
> customer box with the problem is a multi-processor box, all the other
> boxes we've tested on are single-processor.
>
> TIA for any help,
>
> Tim

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp(at)perchine(dot)com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

In response to

Database corruption in 7.0.3 at 2001-03-15 08:52:13 from Tim Allen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Karel Zak	2001-03-15 09:58:11	Re: AW: Re: Week number
Previous Message	Paul	2001-03-15 09:40:47	Sheduling in SQL