Re: Vacuum/Analyze no se esta ejecutando correctamente

From: "Carlos T(dot) Groero Carmona" <ctonetg(at)gmail(dot)com>
To: José González <josego(at)simgia(dot)com>
Cc: Horacio Miranda <hmiranda(at)gmail(dot)com>, Lista PostgreSql <pgsql-es-ayuda(at)postgresql(dot)org>
Subject: Re: Vacuum/Analyze no se esta ejecutando correctamente
Date: 2019-07-01 18:15:50
Message-ID: CABh6Tc12oDfaMPe1Bo+iHoBZdtsZFHGsK6EDj3C0v73_Hq3mvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-es-ayuda

Hola Lista,

Como les dije anteriormente, he tenido un proceso de autovacuum estancado
por mucho tiempo, la vez pasada estuvo corriendo por mas de 30 dias, a y
los 30 dias postgres lo detuvo drasticamente, en el proceso postgres se
detuvo solo.

Buscando en los log encontre que postgres ejecuto -9 para detener el
proceso y durante ee momento la shared memory se corrompio y entonces
postgres se apago, voy a poner copia de los log en este correo.
Esto paso el Sabado 06/22.
Revisando pg_stat_activity veo que esta pasando nuevamente:
pid | query | age
-------+------------------------------------------------------+------------------------
23348 | autovacuum: VACUUM ANALYZE pg_catalog.pg_class | 8 days
18:57:04.333289

Ya intente eliminarlo sin resultados:
postgres=# select pg_terminate_backend(23348);
pg_terminate_backend
----------------------
t
(1 row)

Me preocupa que dentro de 30 dias si no logro encontrar alguna solucion,
postgres se vea afectado nuevamente.

Ah, algo que deben saber, despues que postgres se detuvo, ejecute vacuum
analyze en la BD postgres, y funciono sin problema, sin embargo, despues de
10 min trate de ahcer lo mismo en otra DB pequena 16MB y el proceso se
estanco, no pude deterlo hasta que ejecute -9 y postgres se detuvo y lo
volvi a levantar.

Pero ya esta situacion me esta preocupando bastante.
Alguna sugerencia, comentario sera agradecido.

Gracias,
Carlos

2019-06-22 03:08:58 CDT [4374]:
user=[unknown],db=[unknown],app=[unknown],client=pgbouncer0(35098),trans=0
LOG: 00000: connection received: host=192.168.100.35 port=35098
2019-06-22 03:08:58 CDT [4374]:
user=[unknown],db=[unknown],app=[unknown],client=pgbouncer0(35098),trans=0
LOCATION: BackendInitialize, postmaster.c:4185
2019-06-22 03:08:58 CDT [4374]:
user=postgres,db=production,app=[unknown],client=pgbouncer0(35098),trans=0
FATAL: 57P03: the database system is shutting down
2019-06-22 03:08:58 CDT [4374]:
user=postgres,db=production,app=[unknown],client=pgbouncer0(35098),trans=0
LOCATION: ProcessStartupPacket, postmaster.c:2226
2019-06-22 03:08:58 CDT [4375]:
user=[unknown],db=[unknown],app=[unknown],client=pgbouncer1(45410),trans=0
LOG: 00000: connection received: host=192.168.100.34 port=45410
2019-06-22 03:08:58 CDT [4375]:
user=[unknown],db=[unknown],app=[unknown],client=pgbouncer1(45410),trans=0
LOCATION: BackendInitialize, postmaster.c:4185
2019-06-22 03:08:58 CDT [4375]:
user=postgres,db=production,app=[unknown],client=pgbouncer1(45410),trans=0
FATAL: 57P03: the database system is shutting down
2019-06-22 03:08:58 CDT [4375]:
user=postgres,db=production,app=[unknown],client=pgbouncer1(45410),trans=0
LOCATION: ProcessStartupPacket, postmaster.c:2226
2019-06-22 03:08:59 CDT [30237]: user=,db=,app=,client=,trans=0 LOG:
00000: server process (PID 32520) was terminated by signal 9: Killed
2019-06-22 03:08:59 CDT [30237]: user=,db=,app=,client=,trans=0 DETAIL:
Failed process was running: VACUUM (ANALYZE);
2019-06-22 03:08:59 CDT [30237]: user=,db=,app=,client=,trans=0 LOCATION:
LogChildExit, postmaster.c:3575
2019-06-22 03:08:59 CDT [30237]: user=,db=,app=,client=,trans=0 LOG:
00000: terminating any other active server processes
2019-06-22 03:08:59 CDT [30237]: user=,db=,app=,client=,trans=0 LOCATION:
HandleChildCrash, postmaster.c:3295
2019-06-22 03:08:59 CDT [6279]: user=,db=,app=,client=,trans=2437527164
WARNING: 57P02: terminating connection because of crash of another server
process
2019-06-22 03:08:59 CDT [6279]: user=,db=,app=,client=,trans=2437527164
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
2019-06-22 03:08:59 CDT [6279]: user=,db=,app=,client=,trans=2437527164
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
2019-06-22 03:08:59 CDT [6279]: user=,db=,app=,client=,trans=2437527164
CONTEXT: while updating tuple (4,1) in relation "pg_statistic"
2019-06-22 03:08:59 CDT [6279]: user=,db=,app=,client=,trans=2437527164
LOCATION: quickdie, postgres.c:2626
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
WARNING: 57P02: terminating connection because of crash of another server
process
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
CONTEXT: while updating tuple (4,1) in relation "pg_statistic"
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
LOCATION: quickdie, postgres.c:2626
2019-06-22 03:08:59 CDT [25716]:
user=replicant,db=[unknown],app=walreceiver,client=host_standby1(36912),trans=0
WARNING: 57P02: terminating connection because of crash of another server
process
2019-06-22 03:08:59 CDT [25716]:
user=replicant,db=[unknown],app=walreceiver,client=host_standby1(36912),trans=0
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
2019-06-22 03:08:59 CDT [25716]:
user=replicant,db=[unknown],app=walreceiver,client=host_standby1(36912),trans=0
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
2019-06-22 03:08:59 CDT [25716]:
user=replicant,db=[unknown],app=walreceiver,client=host_standby1(36912),trans=0
LOCATION: quickdie, postgres.c:2626
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
LOG: 08006: could not send data to client: Broken pipe
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
CONTEXT: while updating tuple (4,1) in relation "pg_statistic"
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
LOCATION: internal_flush, pqcomm.c:1441
2019-06-22 03:08:59 CDT [20776]:
user=postgres,db=db_local,app=psql,client=localhost(51950),trans=3118109880
STATEMENT: vacuum ANALYZE verbose;
2019-06-22 03:08:59 CDT [30237]: user=,db=,app=,client=,trans=0 LOG:
00000: archiver process (PID 310) exited with exit code 1
2019-06-22 03:08:59 CDT [30237]: user=,db=,app=,client=,trans=0 LOCATION:
LogChildExit, postmaster.c:3553

On Thu, Jun 13, 2019 at 9:23 PM Carlos T. Groero Carmona <ctonetg(at)gmail(dot)com>
wrote:

> En mi caso empezo hace 5 dias y nunca ha terminado, y la base de datos
> solo tiene 1 GB.
>
> Solo ejecute un simple: psql con vacuum analyze;
>
> Tampoco funiono el vacuumdb -az
>
> Ambos comandos han eatado corriendo por dias sin resultado alguno.
>
> Estoy pensando en deterlos con kill -15 pid
>
> No quisiera ejecutar kill -9 pid pues no estoy estoy seguro si ese proceso
> stancado podria afectar el servidor.
>
> Ya trate ejecutar terminate_backend() pero no funciona.
>
> La version de postgres es 9.6.12.
>
> Ante todo gracias por cualquier sugerencia,
>
> Carlos
>
>
>
> On Wed, Jun 12, 2019, 7:57 AM José González <josego(at)simgia(dot)com> wrote:
>
>> Nada. Sigue ejecuntando sin nigun problema. Pero tarda horas y antes lo
>> hacia en minutos.
>>
>> Saludos, jose
>>
>> El mié., 12 jun. 2019 a las 1:40, Horacio Miranda (<hmiranda(at)gmail(dot)com>)
>> escribió:
>>
>>> Que te dicen los logs ?
>>> On 11/06/2019 1:26 AM, José González wrote:
>>>
>>> Tengo el mismo problema con el vacuum. Antes usaba la versión 9.4 y todo
>>> bien. Ahora uso la 9.6 y tengo problemas.
>>>
>>> Saludos, jose
>>>
>>> El dom., 9 jun. 2019 a las 22:07, Carlos T. Groero Carmona (<
>>> ctonetg(at)gmail(dot)com>) escribió:
>>>
>>>>
>>>> Despues de mas de 19 horas seguidas, estos procesos se siguen
>>>> ejecutando, sin reflejar ningun resultado en la base de datos.
>>>>
>>>> pid | query |
>>>> age
>>>>
>>>>
>>>> -------+----------------------------------------------------+-------------------------
>>>>
>>>> 6279 | autovacuum: VACUUM ANALYZE pg_catalog.pg_class | 25 days
>>>> 18:37:41.526188
>>>>
>>>> 32520 | VACUUM (ANALYZE); |
>>>> 19:27:39.746022
>>>>
>>>> 20776 | vacuum ANALYZE verbose; |
>>>> 19:11:23.328425
>>>>
>>>> Los he tratado de detener si resultado alguno.
>>>>
>>>> postgres=# select pg_terminate_backend(32520);
>>>>
>>>> pg_terminate_backend
>>>>
>>>> ----------------------
>>>>
>>>> t
>>>>
>>>> (1 row)
>>>>
>>>>
>>>>
>>>> pid | query |
>>>> age
>>>>
>>>>
>>>> -------+-------------------------------------------------------------+-------------------------
>>>>
>>>> 6279 | autovacuum: VACUUM ANALYZE pg_catalog.pg_class |
>>>> 25 days 18:41:12.710997
>>>>
>>>> 32520 | VACUUM (ANALYZE); |
>>>> 19:31:10.930831
>>>>
>>>> 20776 | vacuum ANALYZE verbose;
>>>>
>>>> Hace 25 dias que el proceso 6279 ha estado corriendo interrumpidamente,
>>>> si no mal recuerdo en un momento que reinicie el servicio de postgres ese
>>>> proceso se detuvo forzosamente y cuando el servicio reanudo entonces el
>>>> proceso se empezo a ajecutar interrumpidamente desde ese entonces.
>>>>
>>>> Autovacuum se esta ejecutando sin problema, pero no puedo ejecutar un
>>>> vacuum manual.
>>>>
>>>> Ejecute un vacuum manual a una tabla specificamente, pero el
>>>> relfrozendid no se reinicio tampoco.
>>>>
>>>> postgres=# vacuum ANALYZE pg_shdescription;
>>>>
>>>> VACUUM
>>>>
>>>>
>>>> postgres=# vacuum pg_shdescription;
>>>>
>>>> VACUUM
>>>>
>>>>
>>>> table_name | age | percentage_transaction_ids_used |
>>>> table_size
>>>>
>>>>
>>>> -----------------------+-----------+---------------------------------+------------
>>>>
>>>> pg_shdescription | 747265306 | 35.584 |
>>>> 48 kB
>>>>
>>>>
>>>>
>>>> No recuerdo si lo dije anteriormente, pero el mismo cron job ha estado
>>>> ejecutandoce por casi 4 meses en mi servidor de pre-production sin ningun
>>>> tipo de problema.
>>>>
>>>>
>>>> Algo que deben saber, que este es el servidor que hace unas semanas que
>>>> migre hace 9.6 del servidor que tenia la tabala pg_databases corructa en
>>>> 9.3, y ahora que me pongo a pensar desde que lo hice el analyze y
>>>> vacuum manual de alguna manera u otra no han funcionado correctamente, pero
>>>> autovacuum reinicio todo los XID para evitar un wraparound en us momento.
>>>>
>>>>
>>>> Este un comportamiento que no habia visto antes en postgres, asi que
>>>> agradecere cualquier tipo de ayuda, sugerencia o documentacion que me guie
>>>> a una solucion.
>>>>
>>>> Carlos
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Jun 9, 2019 at 1:59 AM Carlos T. Groero Carmona <
>>>> ctonetg(at)gmail(dot)com> wrote:
>>>>
>>>>> Hola lista,
>>>>>
>>>>> Esyou utilizando la version 9.6 de postgres, y hal algo que me tiene
>>>>> un poco confundifo con utovacuum/analyze.
>>>>>
>>>>> Primero desde que migre a este nueva version, se ha estado ejecuatando
>>>>> un proceso autovacuum analyze por mas de 24 dias, he tratado de eliminarlo
>>>>> ejecutando terminate_backend() y no se detiene, tambien trate con kill pid
>>>>> y tampoco se detiene.
>>>>>
>>>>> Lo otro que me esta sucediendo un poco raro es que se esta demorando
>>>>> demaisado un siemple vacuum analyze en base de datos con menos de 1GB de
>>>>> tamano, ademas lo ejecute en postgres database, y nunca se reinicio el xid.
>>>>>
>>>>> postgres=# vacuum ANALYZE ;
>>>>>
>>>>> VACUUM
>>>>>
>>>>> postgres=# SELECT
>>>>>
>>>>> datname,
>>>>>
>>>>> max(age(datfrozenxid)),
>>>>>
>>>>> round(max(age(datfrozenxid)) / 2100000000.0 * 100.0, 3) AS
>>>>> percentage_transaction_ids_used
>>>>>
>>>>> FROM pg_database
>>>>>
>>>>> group by datname
>>>>>
>>>>> order by 2 desc;
>>>>>
>>>>> datname | max | percentage_transaction_ids_used
>>>>>
>>>>>
>>>>> -------------------------+-----------+---------------------------------
>>>>>
>>>>> sami_production | 874194263 | 41.628
>>>>>
>>>>> locations | 799037901 | 38.049
>>>>>
>>>>> ofx_production | 799037901 | 38.049
>>>>>
>>>>> locations_backup | 799037901 | 38.049
>>>>>
>>>>> internal_p2p_production | 799035844 | 38.049
>>>>>
>>>>> template1 | 799035844 | 38.049
>>>>>
>>>>> semi_temp | 799035709 | 38.049
>>>>>
>>>>> auth_db | 749061558 | 35.670
>>>>>
>>>>> template0 | 749045061 | 35.669
>>>>>
>>>>> postgres | 730570342 | 34.789
>>>>>
>>>>> (10 rows)
>>>>>
>>>>>
>>>>> el xid in postgres deberia estar por debajo del 10% despues de
>>>>> ejecutar vacuum en esa base de datos.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> postgres=# select pg_terminate_backend(20776);
>>>>>
>>>>> pg_terminate_backend
>>>>>
>>>>> ----------------------
>>>>>
>>>>> t
>>>>>
>>>>> (1 row)
>>>>>
>>>>>
>>>>> postgres=# select pid, query, age(now(), xact_start) from
>>>>> pg_stat_activity where upper(query) like upper('vacuu%') and state =
>>>>> 'active' or upper(query) like upper('auto%') order by 3 desc;
>>>>>
>>>>> pid | query |
>>>>> age
>>>>>
>>>>>
>>>>> -------+------------------------------------------------------+-------------------------
>>>>>
>>>>> 6279 | autovacuum: VACUUM ANALYZE pg_catalog.pg_class | 24
>>>>> days 23:43:14.417379
>>>>>
>>>>> 26680 | autovacuum: VACUUM public.transactions_w |
>>>>> 02:43:22.432889
>>>>>
>>>>> 28142 | autovacuum: VACUUM public.featureusagereportusers |
>>>>> 01:06:05.995937
>>>>>
>>>>> 1682 | autovacuum: VACUUM public.sessionusagereportusers |
>>>>> 00:58:54.605847
>>>>>
>>>>> 30236 | autovacuum: VACUUM ANALYZE public.sessions |
>>>>> 00:43:47.847795
>>>>>
>>>>> 32520 | VACUUM (ANALYZE); |
>>>>> 00:33:12.637213
>>>>>
>>>>> 20776 | vacuum ANALYZE verbose; |
>>>>> 00:16:56.219616
>>>>>
>>>>> 10801 | autovacuum: VACUUM ANALYZE pg_catalog.pg_attribute |
>>>>> 00:16:54.8015
>>>>>
>>>>> 24212 | autovacuum: VACUUM public.transactions_f |
>>>>> 00:16:27.303903
>>>>>
>>>>> 1533 | autovacuum: VACUUM public.mobileusermo . |
>>>>> 00:10:12.41268
>>>>>
>>>>> 27837 | autovacuum: VACUUM public.batchhistories_fieldpoint |
>>>>> 00:00:53.361894
>>>>>
>>>>> 30151 | autovacuum: VACUUM public.featureusagereportyearlies |
>>>>> 00:00:23.799779
>>>>>
>>>>> (12 rows)
>>>>>
>>>>>
>>>>> El proceso 32520 es ejecutado por un cronjob que tengo planificado
>>>>> correr cada domingo a la 1am
>>>>>
>>>>> y el proceso 20776 lo ejecute para probar si mi sospecha de que vacuum
>>>>> no se esta ejecutando correctamente y no quisiera usar la palabra corrupta,
>>>>> apenas cabo de migrar a la version 9.6 porque tuve pg_databa estaba
>>>>> corructa en la version 9.3
>>>>>
>>>>>
>>>>> sobre autovacuum solo tengo 10 workers y el cost_limit es 1200.
>>>>>
>>>>>
>>>>> Alguna idea de lo que esta pasando y como podria detener esos
>>>>> processos que sin afectar el cluster.
>>>>>
>>>>>
>>>>> es una option kill -9 vacuum_pid?
>>>>>
>>>>>
>>>>> Una vez mas gracias por la ayuda,
>>>>>
>>>>> Carlos
>>>>>
>>>>

In response to

Browse pgsql-es-ayuda by date

  From Date Subject
Next Message Alvaro Herrera 2019-07-01 21:35:20 Re: Fallo en función
Previous Message Alvaro Herrera 2019-06-28 20:33:28 Re: Múltiples réplicas con hot_standby_feedback activo