Re: [HACKERS] Shared memory corruption?

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: tih(at)Hamartun(dot)Priv(dot)NO (Tom I Helbekkmo)
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Shared memory corruption?
Date: 1998-02-12 20:09:24
Message-ID: 199802122010.PAA03196@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Vadim, I may need your help on this one. I can reproduce it by runinng
the regression test, and doing a shell 'while' loop that continuously
creates databases:

while :
do
sh -c 'createdb $$'
done

I get the errors too. I have no idea on a cause. I would hope it is
not the new deadlock code, or locking fixes I did. I think the message
comes from smgrblindwrt. Is it possible our new speedups are causing
it?

>
> [similar report submitted previously, but this is more complete]
>
> There is something that looks like shared memory corruption going on,
> which I first noticed by accident the other day, in the 1998-02-09
> snapshot. It's still there today, with the 1998-02-12 one, and looks
> like the following on my Sun SS2 under NetBSD/sparc 1.3 (I've created
> a simple test case here, for easy testing elsewhere):
>
> First, I run initdb, start a postmaster, create a user 'tih', stop the
> postmaster, restart the postmaster with '-d', thus:
>
> barsoom:postgres> postmaster -i -d
> FindBackend: searching PATH ...
> FindBackend: found "/usr/local/pgsql/bin/postgres" using PATH
>
> Next, I create a database 'words', thus:
>
> barsoom:tih> createdb words
> barsoom:tih>
>
> The postmaster says:
>
> postmaster: BackendStartup: pid 6542 user tih db template1 socket 5
> postmaster: reaping dead processes...
> postmaster: CleanupProc: pid 6542 exited with status 0
>
> I fire up psql, thus:
>
> barsoom:tih> psql words
> words=>
>
> The postmaster goes:
>
> postmaster: BackendStartup: pid 6549 user tih db words socket 5
>
> In psql, I then do the following:
>
> words=> create table dictionary (entry char(64));
> CREATE
> words=> create unique index dict_by_entry on dictionary (entry);
> CREATE
> words=> copy dictionary from '/usr/share/dict/words';
>
> The postmaster generates no output at this, and the copy starts as it
> should. There is much disk activity. Next, while this is running,in
> another terminal window, as the same user 'tih', I do:
>
> barsoom:tih> createdb
> Connection to database 'template1' failed.
> PQexec() -- There is no connection to the backend.
> createdb: database creation failed on tih.
> barsoom:tih>
>
> When this happens, the postmaster generates the following output:
>
> postmaster: BackendStartup: pid 6560 user tih db template1 socket 5
> ERROR: cannot write block 171 of dict_by_entry [words] blind
> postmaster: reaping dead processes...
> postmaster: CleanupProc: pid 6560 exited with status 0
>
> Looking at processes running on the system at this time, I see:
>
> 6549 p6 R+ 2:01.88 /usr/local/pgsql/bin/postgres -p -Q -P5 -v 65536 words
>
> This is the backend doing the copy. It is spinning furiously, eating
> CPU like there was no tomorrow -- but there is no more disk activity.
> The terminal window where I initiated the copy operation looks as
> though it were proceeding normally. So now I attempt to perform the
> database creation again, thus (in the second terminal):
>
> barsoom:tih> createdb
>
> Nothing happens -- it just hangs there. The postmaster says:
>
> postmaster: BackendStartup: pid 6595 user tih db template1 socket 5
>
> Looking with ps again, I can see that this backend is now also running
> wild, sharing the CPU half and half with the one with PID 6549...
>
> Note that I'm trying to create a different database when it breaks;
> the only possible interaction is through the shared memory that I
> understand is maintained by the postmaster on behalf of the backends.
> As for seeing this on other platforms, I certainly hope it's
> repeatable elsewhere, but it's not unreasonable to assume that it
> could cause different symptoms on other platforms, including quiet
> data corruption...
>
> The whole thing is completely repeatable here -- any ideas can be
> verified quickly and easily -- and with enthusiasm. :-)
>
> -tih
> --
> Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
>
>
>

--
Bruce Momjian
maillist(at)candle(dot)pha(dot)pa(dot)us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-02-12 21:14:19 Re: [HACKERS] shmem/mmap Q
Previous Message Goran Thyni 1998-02-12 19:44:05 shmem/mmap Q