Tom, Bruce, any suggestions?
ryan_rs(at)c4(dot)com (Ryan) wrote in message news:<7299ab58(dot)0110081128(dot)50cf8fba(at)posting(dot)google(dot)com>...
> I changed the number of shared buffers to 3000 and my database locks
> on a simple query. I must kill the database with pg_ctl stop -m i.
> Neither "smart" nor "fast" stops appear to succeed. One CPU gets
> pinned.
>
> When I set the number of shared buffers to 64 everything is fine. No
> data appears to have been corrupted, but I haven't been able to do a
> thorough check. I really didn't have a chance to do much of a
> postmortem. I only have what I can get from my logs:
> - schema
> - query
> - query plan
> - vacuum results
> - postgres log
>
> If anybody wants more data I can reproduce the problem. It is very
> repeatable. Also, I prefer to discover what went wrong rather than
> simply upgrade to 7.1.3 and hope for the best. None of the
> fixes/enhancements listed in 7.1.3 seem relevant to this problem.
>
> I am running postgres 7.1.2 on Solaris 5.7 E450, 4 processors, 2 gig
> ram.
>
> I added the following lines to /etc/system:
> set shmsys:shminfo_shmmax=0x10000000
> set shmsys:shminfo_shmmin=1
> set shmsys:shminfo_shmmni=256
> set shmsys:shminfo_shmseg=256
> set semsys:seminfo_semmap=256
> set semsys:seminfo_semmni=512
> set semsys:seminfo_semmsl=32
> set semsys:seminfo_semmns=512
> set rlim_fd_max=65535
> set rlim_fd_cur=65535
>
> Here is the query that locks the DB:
> select i8, i5, count(*), sum( float8mi(d2,d1) ), sum(i9)::int4 as
> doll from
> calls_1001548800 group by i5, i8;
> i8 has about 5 unique values.
> i5 has two unique values (boolean) plus NULL.
>
> The query was entered into psql by hand.
>
>
> Here is the table def:
> CREATE TABLE calls_1001548800 (
> pk int4 primary key,
> t1 text NOT NULL,
> t2 text NOT NULL,
> i1 int4 NOT NULL,
> i2 int4 NOT NULL,
> d1 double precision NOT NULL,
> d2 double precision NOT NULL,
> d3 double precision NOT NULL,
> d4 double precision NOT NULL,
> i3 int4 NOT NULL,
> d5 real NOT NULL,
> i4 int4 NOT NULL,
> t3 text,
> i5 boolean,
> i6 int4 NOT NULL,
> i7 int4 NOT NULL,
> i8 int2 NOT NULL,
> i9 int4,
> t4 text
> );
> CREATE INDEX calls_1001548800_d21 on calls_1001548800 (
> float8mi(d2,d1) );
> CREATE INDEX calls_1001548800_i3 on calls_1001548800 ( i3 );
> CREATE INDEX calls_1001548800_i9 on calls_1001548800 ( i9 );
>
>
> Vacuum of the table (before the query):
>
> 5681:DEBUG: --Relation calls_1001548800--
> 5682:DEBUG: Pages 915: Changed 54, reaped 559, Empty 0, New 0; Tup
> 33842: Vac 1077, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 1
> 42, MaxLen 648; Re-using: Free/Avail. Space 279912/279912;
> EndEmpty/Avail. Pages 0/559. CPU 0.00s/0.11u sec.
> 5683:DEBUG: Index calls_1001548800_pkey: Pages 179; Tuples 33842:
> Deleted 1077. CPU 0.00s/0.66u sec.
> 5684:DEBUG: Index calls_1001548800_d21: Pages 95; Tuples 33842:
> Deleted 1077. CPU 0.00s/0.70u sec.
> 5685:DEBUG: XLogWrite: new log file created - consider increasing
> WAL_FILES
> 5686:DEBUG: Index calls_1001548800_i3: Pages 98; Tuples 33842:
> Deleted 1077. CPU 0.39s/0.78u sec.
> 5687:DEBUG: Index calls_1001548800_i9: Pages 100; Tuples 33842:
> Deleted 1077. CPU 0.03s/0.66u sec.
> 5688:DEBUG: Rel calls_1001548800: Pages: 915 --> 886; Tuple(s) moved:
> 1122. CPU 0.00s/1.16u sec.
> 5689:DEBUG: Index calls_1001548800_pkey: Pages 179; Tuples 33842:
> Deleted 1122. CPU 0.00s/0.38u sec.
> 5690:DEBUG: Index calls_1001548800_d21: Pages 95; Tuples 33842:
> Deleted 1122. CPU 0.00s/0.40u sec.
> 5691:DEBUG: Index calls_1001548800_i3: Pages 102; Tuples 33842:
> Deleted 1122. CPU 0.01s/0.39u sec.
> 5692:DEBUG: Index calls_1001548800_i9: Pages 104; Tuples 33842:
> Deleted 1122. CPU 0.00s/0.40u sec.
>
>
> Here is a snippet of the log file showing the query plan and the
> events after the query.
>
> 7553:NOTICE: QUERY PLAN:
> 7554:
> 7555:Aggregate (cost=3770.44..4108.86 rows=3384 width=19)
> 7556: -> Group (cost=3770.44..3939.65 rows=33842 width=19)
> 7557: -> Sort (cost=3770.44..3770.44 rows=33842 width=19)
> 7558: -> Seq Scan on calls_1001548800
> (cost=0.00..1224.42 rows=33
> 842 width=19)
> 7559:
> 7560:Smart Shutdown request at Tue Oct 2 18:34:08 2001
> 7561:Fast Shutdown request at Tue Oct 2 18:35:03 2001
> 7562:Aborting any active transaction...
> 7563:FATAL 1: This connection has been terminated by the
> administrator.
> 7564:Immediate Shutdown request at Tue Oct 2 18:39:47 2001
> 7565:NOTICE: Message from PostgreSQL backend:
> 7566: The Postmaster has informed me that some other backend died
> abnormally
> and possibly corrupted shared memory.
> 7567: I have rolled back the current transaction and am going
> to termina
> te your database system connection and exit.
> 7568: Please reconnect to the database system and repeat your query.
> 7569:DEBUG: database system was interrupted at 2001-10-02 18:11:06
> PDT
> 7570:DEBUG: CheckPoint record at (39, 1811142680)
> 7571:DEBUG: Redo record at (39, 1811142680); Undo record at (0, 0);
> Shutdown FA
> LSE
> 7572:DEBUG: NextTransactionId: 2086582; NextOid: 134710037
> 7573:DEBUG: database system was not properly shut down; automatic
> recovery in p
> rogress...
> 7574:DEBUG: ReadRecord: record with zero len at (39, 1811142744)
> 7575:DEBUG: redo is not required
> 7576:DEBUG: database system is in production state
>
>
> Ryan