Re: migrating from single disk to RAID 5

From: Paul Wehr <postgresql(at)industrialsoftworks(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: migrating from single disk to RAID 5
Date: 2001-12-14 01:07:52
Message-ID: 3C1950E8.3070907@industrialsoftworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Well, I've upgraded to kernel 2.4.16, which seems to have helped. I
have compiled postgres 7.1.3 with debugging on (level 4) but I still
have one "larger" query (32G of temporary sort space, 3 hours of cpu on
an Athlon 800 w/1.5G ram) that crashes the backend with "status code
139". Any idea where I could find out what status code 139 is?

also, FWIW, here is a backtrace.

Thanks,

-paul

GNU gdb 5.0mdk-11mdk Linux-Mandrake 8.0
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-mandrake-linux"...
Core was generated by `postgres: web bcn [local] SELECT '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libreadline.so.4.1...done.
Loaded symbols for /lib/libreadline.so.4.1
Reading symbols from /lib/libncurses.so.5...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libgpm.so.1...done.
Loaded symbols for /usr/lib/libgpm.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0 0x08066493 in nocachegetattr ()
#0 0x08066493 in nocachegetattr ()
#1 0x081498a4 in comparetup_heap ()
#2 0x081495ca in tuplesort_heap_insert ()
#3 0x081490ce in beginmerge ()
#4 0x08148ecb in mergeonerun ()
#5 0x08148dab in mergeruns ()
#6 0x081487a5 in tuplesort_performsort ()
#7 0x080c52d0 in ExecSort ()
#8 0x080bd809 in ExecProcNode ()
#9 0x080c3a9f in ExecMergeJoin ()
#10 0x080bd7c0 in ExecProcNode ()
#11 0x080c4547 in ExecNestLoop ()
#12 0x080bd7af in ExecProcNode ()
#13 0x080c4547 in ExecNestLoop ()
#14 0x080bd7af in ExecProcNode ()
#15 0x080c4547 in ExecNestLoop ()
#16 0x080bd7af in ExecProcNode ()
#17 0x080c52b9 in ExecSort ()
#18 0x080bd809 in ExecProcNode ()
#19 0x080c5543 in ExecUnique ()
#20 0x080bd819 in ExecProcNode ()
#21 0x080bc6a6 in ExecutePlan ()
#22 0x080bbd75 in ExecutorRun ()
#23 0x08104d81 in ProcessQuery ()
#24 0x08103779 in pg_exec_query_string ()
#25 0x0810480b in PostgresMain ()
#26 0x080ee514 in DoBackend ()
#27 0x080ee0f5 in BackendStartup ()
#28 0x080ed2c9 in ServerLoop ()
#29 0x080eccd6 in PostmasterMain ()
#30 0x080cc94f in main ()
#31 0x401280de in __libc_start_main () from /lib/libc.so.6

Paul Wehr wrote:

> I've moved our database from a single 30G drive with reiserfs (3.6.25
> on kernel 2.4.9) to a 4-drive RAID 5 of 80G disks with reiserfs on
> that (yeilding 167G). The performance is fantastic, but the backend
> server is periodically crashing.
>
> With almost no information whatsoever, anyone care to speculate on if
> this is:
> 1) hardware problem (seems unlikely since the drives are < 1 year old)
> 2) reiserfs or kernel bug
> 3) problem with a) shutting down postgres, b) cp -arv /olddata
> /newdata, c) edit /etc/rc.d, d) start postgres
> 4) something else entirely
>
> also, if it's 1), is there something better to use to check the array
> than "badblocks /dev/md0"?
>
> Not much to go on, I know, but obviously we are in big trouble if the
> database is "randomly" crashing (it seems to work fine maybe 90% of
> the time), and I'm worried about data corruption..
>
> TIA
>
> -paul
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bob Jones 2001-12-14 01:12:51 Working on "SELECT * WHERE numeric_col = 2001.2" problem?
Previous Message Bob Jones 2001-12-14 01:07:28 Correction: Working on "SELECT * WHERE numeric_col = 2001.2" problem?