Re: WAL recycling, ext3, Linux 2.4.18

From: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, Glenn Stone <gstone(at)pogolinux(dot)com>
Subject: Re: WAL recycling, ext3, Linux 2.4.18
Date: 2002-07-08 16:20:32
Message-ID: 5.1.0.14.2.20020708120620.01f39538@pop.pexicom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At 11:57 AM 7/8/2002, Tom Lane wrote:
>Doug Fields <dfields-pg-general(at)pexicom(dot)com> writes:
> > I'm still trying to track down my very odd periodic pauses/hangs in
> > PostgreSQL 7.2.1.
> > I've localized it to what seems to be the "recycled transaction log file"
> > lines in the log file. Whenever this happens, a whole bunch of queries
> > which were "on hold" (just sitting there, as can be seen in
> > pg_stat_activity, when they usually execute in fractions of a second) come
> > back to life and finish very quickly.
>
>Hm. That would be from the tail end of a checkpoint operation. If you
>do a CHECKPOINT command manually, do you get a similar hangup?

In fact, yes. (I never did a CHECKPOINT ever before now.) It took less time
(probably because I did it around half way through the 6.5-minute WAL
recycling period), but had almost identical effects. pg_stat_activity
showed a whole bunch of queries stacked up, and vmstat exhibited the same
pathology: the other postgres processes blocked during the checkpoint (it's
pretty plain in the vmstat output below).

procs memory swap io system
cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
0 0 0 0 3851828 121732
3913552 0 0 2 79 56 62 49 2 49
1 0 0 0 3850724 121732
3913576 0 0 0 1171 214 1023 4 0 95
1 0 0 0 3851544 121732
3913576 0 0 0 266 179 148 1 0 99
0 0 0 0 3851780 121732
3913600 0 0 0 1407 266 364 4 1 95
0 2 1 0 3843020 121732
3913616 0 0 0 4182 195 186 2 4 94
0 2 3 0 3841608 121732
3913616 0 0 0 2722 197 191 0 1 99
0 2 3 0 3840984 121732
3913616 0 0 0 2774 232 298 1 1 99
0 2 3 0 3840056 121732
3913616 0 0 0 2739 200 204 1 1 99
0 2 3 0 3839280 121732
3913616 0 0 0 1634 183 201 1 0 99
0 2 4 0 3839424 121732
3913616 0 0 0 4098 202 143 0 1 99
0 2 4 0 3837672 121732
3913616 0 0 0 2710 194 170 1 1 99
0 2 3 0 3836992 121732
3913616 0 0 0 2139 156 82 0 0 99
0 2 4 0 3827844 121732
3913632 0 0 0 5362 170 3260 3 1 95
0 2 4 0 3828592 121732
3913632 0 0 0 2134 143 53 0 1 99
1 1 0 0 3830892 121744
3913668 0 0 0 4926 152 6087 8 3 89
0 0 0 0 3850360 121744 3913772 0 0 0 7851 610
16106 61 5 34
0 0 0 0 3850360 121744
3913772 0 0 0 20 189 284 1 0 99
0 0 0 0 3850360 121744
3913772 0 0 0 14 147 157 0 0 100

>What are the stuck queries doing exactly? Can you attach to a few of the
>stuck backends with gdb and get stack traces?

The stuck queries are of the following nature, typically either:

1) UPDATE x SET somedate=now() WHERE primarykey IN (1,2,3,4...300 to 450 IDs)
2) DELETE FROM x WHERE key1 = 1 AND key2 IN (1,2,3,4...300 to 450 IDs)

In my pg_stats display below you can see some of them.

Here is a stack trace. I did "where" about every second during the "pause"
and received the same stack trace. This is on PID 3456 per the
pg_stat_activity listing below. After things clear up, I also did a stack
trace; it's blocked on recv, presumably waiting for more commands to come
down the socket. (I tried a few other PIDs with similar stack traces, all
stuck on the semop call.)

(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x4028913f in semop () from /lib/libc.so.6
(gdb) where
#0 0x4028913f in semop () from /lib/libc.so.6
#1 0x08106dec in IpcSemaphoreLock ()
#2 0x0810bce1 in LWLockAcquire ()
#3 0x0808433b in XLogInsert ()
#4 0x08079033 in _bt_doinsert ()
#5 0x08078ea9 in _bt_doinsert ()
#6 0x080786ed in _bt_doinsert ()
#7 0x0807be9d in btinsert ()
#8 0x08152822 in OidFunctionCall5 ()
#9 0x08077cf8 in index_insert ()
#10 0x080c674a in ExecInsertIndexTuples ()
#11 0x080c373e in ExecutorEnd ()
#12 0x080c3374 in ExecutorEnd ()
#13 0x080c2797 in ExecutorRun ()
#14 0x081104de in ProcessQuery ()
#15 0x0810ed70 in pg_exec_query_string ()
#16 0x0810fd5e in PostgresMain ()
#17 0x080f6d4e in ClosePostmasterPorts ()
#18 0x080f669f in ClosePostmasterPorts ()
#19 0x080f5882 in PostmasterMain ()
#20 0x080f5391 in PostmasterMain ()
#21 0x080d4e18 in main ()
#22 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) c
Continuing.

And then later...

(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x402887f2 in recv () from /lib/libc.so.6
(gdb) where
#0 0x402887f2 in recv () from /lib/libc.so.6
#1 0x080d42bc in StreamClose ()
#2 0x080d430d in pq_getbyte ()
#3 0x0810e7c8 in HandleFunctionRequest ()
#4 0x0810e837 in HandleFunctionRequest ()
#5 0x0810fc5e in PostgresMain ()
#6 0x080f6d4e in ClosePostmasterPorts ()
#7 0x080f669f in ClosePostmasterPorts ()
#8 0x080f5882 in PostmasterMain ()
#9 0x080f5391 in PostmasterMain ()
#10 0x080d4e18 in main ()
#11 0x401d114f in __libc_start_main () from /lib/libc.so.6
(gdb) c
Continuing.

> > 1) Is there any known bad interactions with ext3fs and PostgreSQL?
>
>Dunno. The CHECKPOINT would probably create a significant number of
>disk write requests, followed by a sync() request. If that could
>monopolize an ext3 filesystem for a long time, perhaps that would
>explain your problem. But I haven't heard similar complaints before.
>
>What do you have shared_buffers set to?

pexicast_lg=# show shared_buffers;
NOTICE: shared_buffers is 65536
SHOW VARIABLE

I believe this is about 512 megs. I have 8 gigs RAM on this server and
tried 65536 and before it 32768. Either one seems to work fine - I didn't
notice any significant performance difference yet between them.

See below for pg_stat_activity...

I appreciate the help!

Cheers,

Doug

pexicast_lg=# select * from pg_stat_activity where datname='pexicast_lg';
select now();
datid | datname | procpid | usesysid | usename
|
current_query

---------+-------------+---------+----------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3926366 | pexicast_lg | 3393 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(463196,463216,463236,463256,463276,463296,461176,461196,461216,461236,461256,461276,461296,461316,461336,461356,461376,461396,461416,461436,461456,461476,461496,461516,461536,461556,461576,46
3926366 | pexicast_lg | 3412 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(425066,425086,425106,425126,425146,425166,425186,425206,425226,425246,425266,425286,425306,425326,425346,425366,425386,425406,425426,425446,425466,425486,425506,425526,425546,425566,425586,42
3926366 | pexicast_lg | 3443 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(442093,442113,442133,442153,442173,442193,442213,442233,442253,442273,442293,442313,442333,442353,442373,442393,442413,442433,442453,442473,442493,442513,442533,442553,442573,442593,442613,44
3926366 | pexicast_lg | 3444 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(438107,438127,438147,438167,438187,438207,438227,438247,436147,436167,436187,436207,436227,436247,436267,436287,436307,436327,436347,436367,436387,436407,436427,436447,436467,436487,436507,43
3926366 | pexicast_lg | 3446 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(429470,429490,429510,429530,429550,429570,429590,429610,429630,429650,429670,429690,429710,429730,429750,429770,429790,429810,429830,429850,429870,429890,429910,429930,427830,427850,427870,42
3926366 | pexicast_lg | 3447 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(463658,463678,463698,463718,463738,463758,463778,463798,463818,463838,463858,463878,463898,463918,460378,461838,461858,461878,461898,461918,461938,461958,461978,461998,462018,462038,462058,46
3926366 | pexicast_lg | 3448 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(407502,407522,407542,407562,407582,407602,407622,407642,407662,407682,407702,407722,407742,407762,407782,407802,407822,407842,407862,407882,405802,405822,405842,405862,405882,405902,405922,40
3926366 | pexicast_lg | 3449 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(475717,475737,475757,475777,475797,475817,475837,475857,475877,475897,475917,475937,475957,475977,475997,476017,476037,476057,476077,476097,476117,476137,476157,474077,474097,474117,474137,47
3926366 | pexicast_lg | 3450 | 100 | tomcat | <IDLE>
3926366 | pexicast_lg | 3451 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(438472,438492,438512,438532,438552,438572,438592,438612,438632,438652,438672,436572,436592,436612,436632,436652,436672,436692,436712,436732,436752,436772,436792,436812,436832,436852,436872,43
3926366 | pexicast_lg | 3452 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(415401,415421,415441,415461,415481,415501,415521,415541,415561,415581,415601,415621,415641,415661,415681,413581,413601,413621,413641,413661,413681,413701,413721,413741,413761,413781,413801,41
3926366 | pexicast_lg | 3453 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(442908,442928,442948,442968,440868,440888,440908,440928,440948,440968,440988,441008,441028,441048,441068,441088,441108,441128,441148,441168,441188,441208,441228,441248,441268,441288,441308,44
3926366 | pexicast_lg | 3454 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(410524,410544,410564,410584,410604,410624,410644,410664,410684,410704,410724,410744,408644,408664,408684,408704,408724,408744,408764,408784,408804,408824,408844,408864,408884,408904,408924,40
3926366 | pexicast_lg | 3455 | 100 | tomcat | <IDLE>
3926366 | pexicast_lg | 3456 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(451115,451135,451155,451175,451195,451215,451235,451255,451275,451295,451315,451335,451355,451375,451395,451415,451435,451455,451475,451495,451515,451535,451555,449475,449495,449515,449535,44
3926366 | pexicast_lg | 3457 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(408325,408345,408365,408385,408405,408425,408445,408465,408485,408505,408525,408545,406465,406485,406505,406525,406545,406565,406585,406605,406625,406645,406665,406685,406705,406725,406745,40
3926366 | pexicast_lg | 3458 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(397080,397100,397120,397140,397160,397180,397200,397220,397240,397260,397280,397300,397320,397340,397360,397380,397400,397420,397440,397460,397480,397500,397520,397540,397560,397580,397600,39
3926366 | pexicast_lg | 3459 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(476459,476479,476499,476519,476539,476559,476579,476599,476619,476639,476659,476679,476699,476719,476739,476759,474659,474679,474699,474719,474739,474759,474779,474799,474819,474839,474859,47
3926366 | pexicast_lg | 3460 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(411003,411023,411043,411063,411083,411103,411123,411143,411163,411183,411203,411223,411243,411263,411283,411303,411323,411343,411363,411383,411403,411423,411443,411463,411483,411503,411523,41
3926366 | pexicast_lg | 3461 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(432589,432609,432629,432649,432669,432689,432709,432729,432749,432769,432789,432809,432829,432849,432869,432889,432909,430809,430829,430849,430869,430889,430909,430929,430949,430969,430989,43
3926366 | pexicast_lg | 3462 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(470654,470674,470694,470714,470734,470754,470774,470794,470814,470834,470854,470874,470894,470914,470934,470954,470974,470994,471014,471034,471054,471074,471094,471114,471134,471154,471174,47
3926366 | pexicast_lg | 3497 | 100 | tomcat | UPDATE
list_entries SET last_mail=now() WHERE list_entry_id IN
(440551,440571,440591,440611,440631,440651,440671,440691,440711,440731,440751,440771,440791,440811,440831,440851,440871,440891,440911,440931,440951,440971,440991,441011,441031,441051,441071,44
3926366 | pexicast_lg | 4235 | 101 | dfields | <IDLE>
(23 rows)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2002-07-08 16:39:41 Re: PostgreSQL doesn't use indexes even is enable_seqscan = off
Previous Message Manfred Koizar 2002-07-08 16:20:23 Re: PostgreSQL doesn't use indexes even is enable_seqscan = off