From: | Christoph Berg <cb(at)df7cb(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Notify system doesn't recover from "No space" error |
Date: | 2012-06-29 08:24:30 |
Message-ID: | 20120629082430.GA905@msgid.df7cb.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
[Resending as the original post didn't get through to the list]
Warming up an old thread here - we ran into the same problem.
Database is 9.1.4/x86_64 from Debian/testing. The client application
is bucardo hammering the database with NOTIFYs (including some
master-master replication conflicts, that might add to the parallel
NOTIFY load).
The problem is reproducible with the attached instructions (several
ENOSPC cycles might be requried). When the filesystem is filled using
dd, the bucardo and psql processes will die with this error:
FEHLER: 53100: konnte auf den Status von Transaktion 0 nicht zugreifen
DETAIL: Konnte nicht in Datei »pg_notify/0000« bei Position 180224 schreiben: Auf dem Gerät ist kein Speicherplatz mehr verfügbar.
ORT: SlruReportIOError, slru.c:861
The line number might be different, sometimes its ENOENT, sometimes even
"Success".
Even after disk space is available again, subsequent "NOTIFY foobar"
calls will die, without any other clients connected:
ERROR: XX000: could not access status of transaction 0
DETAIL: Could not read from file "pg_notify/0000" at offset 245760: Success.
ORT: SlruReportIOError, slru.c:854
Here's a backtrace, caught at slru.c:430:
430 SlruReportIOError(ctl, pageno, xid);
(gdb) bt
#0 SimpleLruReadPage (ctl=ctl(at)entry=0xb192a0, pageno=30, write_ok=write_ok(at)entry=1 '\001', xid=xid(at)entry=0)
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/slru.c:430
#1 0x0000000000520d2f in asyncQueueAddEntries (nextNotify=nextNotify(at)entry=0x29b60c8)
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/commands/async.c:1318
#2 0x000000000052187f in PreCommit_Notify ()
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/commands/async.c:869
#3 0x00000000004973d3 in CommitTransaction ()
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/xact.c:1827
#4 0x0000000000497a8d in CommitTransactionCommand ()
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/access/transam/xact.c:2562
#5 0x0000000000649497 in finish_xact_command ()
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:2452
#6 finish_xact_command ()
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:2441
#7 0x000000000064c875 in exec_simple_query (query_string=0x2a99d70 "notify foobar;")
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:1037
#8 PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x29b1df8, username=<optimized out>)
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/tcop/postgres.c:3968
#9 0x000000000060e731 in BackendRun (port=0x2a14800)
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:3611
#10 BackendStartup (port=0x2a14800)
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:3296
#11 ServerLoop ()
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:1460
#12 0x000000000060f451 in PostmasterMain (argc=argc(at)entry=5, argv=argv(at)entry=0x29b1170)
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/postmaster/postmaster.c:1121
#13 0x0000000000464bc9 in main (argc=5, argv=0x29b1170)
at /home/martin/debian/psql/9.1/build-area/postgresql-9.1-9.1.4/build/../src/backend/main/main.c:199
Restarting the cluster seems to fix the condition in some cases, but
I've seen the error persist over restarts, or reappear after some time
even without disk full. (That's also what the customer on the live
system is seeing.)
Christoph
--
cb(at)df7cb(dot)de | http://www.df7cb.de/
Attachment | Content-Type | Size |
---|---|---|
pg_notify_error.sh | application/x-sh | 3.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Eric McKeeth | 2012-06-29 08:34:23 | Re: Covering Indexes |
Previous Message | Cédric Villemain | 2012-06-29 07:11:56 | Re: We probably need autovacuum_max_wraparound_workers |