答复: [External] Re: pgadmin--pgagent---the process hang by unknow reasons

From: Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: "pgadmin-support(at)postgresql(dot)org" <pgadmin-support(at)postgresql(dot)org>
Subject: 答复: [External] Re: pgadmin--pgagent---the process hang by unknow reasons
Date: 2020-12-08 10:07:35
Message-ID: HK2PR03MB46100F14A6DC1E76834E7B1AA8CD0@HK2PR03MB4610.apcprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support pgsql-bugs

Hi Dave

Thanks for your help. I have upgrade to pgagent 4.0 on 12/7 with your recommendation.
CREATE EXTENSION pgagent FROM unpackaged;

Owing to my PostgreSQL server didn’t install by yum. So I copy those file to my extension directory.
Then create extension command work.
cp /usr/pgsql-10/share/extension/pgagent* /data/postgres/share/extension/

After I upgrade pgagent to 4.0. I test some application. It’s work normal. Pgagent 4.0 work as expected.
But when it running almost 1 hour. It experience the problem which same with pgagent 3.4.
I attach the logs and stack for your reference. The pgagent stop on this function “in __lll_lock_wait ()”
The log on the attachment for your reference.
Env: PG 10.5 and OS is Centos 6.9

postgres(at)sltfjfrauxq ~]$ pstack 21492
Thread 4 (Thread 0x7ff8dffff700 (LWP 24059)):
#0 0x00007ff8ea665334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ff8ea6605d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007ff8ea6604a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007ff8ebfbbc01 in DBconn::Return() ()
#4 0x00007ff8ebfc7f32 in Job::Execute() ()
#5 0x00007ff8ebfcac58 in JobThread::operator()() ()
#6 0x00007ff8eb221d47 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.5
#7 0x00007ff8ea65eaa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007ff8ea3abc4d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7ff8debfd700 (LWP 28530)):
#0 0x00007ff8ea3a2403 in poll () from /lib64/libc.so.6
#1 0x00007ff8ebb5828f in ?? () from /usr/lib64/libpq.so.5
#2 0x00007ff8ebb58310 in ?? () from /usr/lib64/libpq.so.5
#3 0x00007ff8ebb538e2 in ?? () from /usr/lib64/libpq.so.5
#4 0x00007ff8ebb5465f in PQconnectdb () from /usr/lib64/libpq.so.5
#5 0x00007ff8ebfba6b5 in DBconn::Connect(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
#6 0x00007ff8ebfba7c4 in DBconn::DBconn(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
#7 0x00007ff8ebfbc9da in DBconn::Get(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
#8 0x00007ff8ebfc7d54 in Job::Execute() ()
#9 0x00007ff8ebfcac58 in JobThread::operator()() ()
#10 0x00007ff8eb221d47 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.5
#11 0x00007ff8ea65eaa1 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ff8ea3abc4d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7ff8de1fc700 (LWP 28531)):
#0 0x00007ff8ea665334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ff8ea6605d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007ff8ea6604a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007ff8ebfbc9a7 in DBconn::Get(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > const&) ()
#4 0x00007ff8ebfc7d54 in Job::Execute() ()
#5 0x00007ff8ebfcac58 in JobThread::operator()() ()
#6 0x00007ff8eb221d47 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.5
#7 0x00007ff8ea65eaa1 in start_thread () from /lib64/libpthread.so.0
#8 0x00007ff8ea3abc4d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7ff8ec1e07e0 (LWP 21492)):
#0 0x00007ff8ea665334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ff8ea6605d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007ff8ea6604a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007ff8ebfbc197 in DBconn::ClearConnections(bool) ()
#4 0x00007ff8ebfd8909 in MainRestartLoop(DBconn*) ()
#5 0x00007ff8ebfd9993 in MainLoop() ()
#6 0x00007ff8ebfdd872 in main ()
[postgres(at)sltfjfrauxq ~]$ kill -9 21492
[postgres(at)sltfjfrauxq ~]$ pgagent_10 -s ./pg_agent_12_8_after_upgrade_version_upgrade.log -l 2 hostaddr=127.0.0.1 port=5432 dbname=postgres user=postgres password=abcd-1234
[postgres(at)sltfjfrauxq ~]$ psql
psql (10.5)

I have a questions about pgagent tables. There are many tables which used by pgagent.
I couldn’t find a document to describe those pgagent system tables. I want to know more detail about the tables and every column.
I want to deploy a monitor script to monitor pgagent state and every job execution result.
Some job failed with “couldn’t get connection with database” . Most of jobs could execute successfully.

Which sql statement to query result to display on the pgadmin page ? I want to use same sql to query latest pgagent jobs status.

[cid:image017(dot)png(at)01D6CD8D(dot)A032E0A0]

徐志宇(Jack)
Database Engineer

DB Team,ITS. Lenovo China
Phone: 86-18910860709
Email:xuzy13(at)lenovo(dot)com
No.6 Shangdi West Road, Haidian District Beijing, China, 100085

发件人: Dave Page <dpage(at)pgadmin(dot)org>
发送时间: 2020年12月4日 17:39
收件人: Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com>
抄送: pgadmin-support(at)postgresql(dot)org
主题: Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons

Hi

On Fri, Dec 4, 2020 at 9:00 AM Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>> wrote:
Hi Dave

I am appreciate you help on this problem.
The production system installed by my teammate. He told me that. He experience some problem when build and compile pgagent 3.4.0 file.
So he install it by yum resource. Then create pgagent tables by execute a script. Look like he didn’t create pgagent extension.
On the production system. I could find those tables which used by pgagent_10 3.4.

Ah, OK. Try running the following (assuming *only* 3.4 is installed):

CREATE EXTENSION pgagent FROM unpackaged;

That should use the pgagent--unpackaged--3.4.sql script to convert the existing 3.4 installation into an extension.

Then, you should be able to do the 4.0 upgrade by installing the package, updating the extension, and then restarting pgAgent.

[cid:image018(dot)png(at)01D6CD8D(dot)A032E0A0]

But I can’t list pgagent extension.
[cid:image019(dot)jpg(at)01D6CD8D(dot)A032E0A0]

Currently. There are over 30 jobs running by pgagent. Most of time pgagent could running well.
I don’t want to re-create those jobs. I want to try to find a solution to upgrade pgagent on this env.
If this env have unknown problem when deploy. Do you have any suggestion to upgrade 4.0 without re-create all pgagent jobs ? Thanks

徐志宇(Jack)
Database Engineer

DB Team,ITS. Lenovo China
Phone: 86-18910860709
Email:xuzy13(at)lenovo(dot)com<mailto:Email%3Axuzy13(at)lenovo(dot)com>
No.6 Shangdi West Road, Haidian District Beijing, China, 100085

发件人: Dave Page <dpage(at)pgadmin(dot)org<mailto:dpage(at)pgadmin(dot)org>>
发送时间: 2020年12月2日 21:16
收件人: Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>>
抄送: pgadmin-support(at)postgresql(dot)org<mailto:pgadmin-support(at)postgresql(dot)org>
主题: Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons

Hi

On Wed, Dec 2, 2020 at 6:50 AM Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>> wrote:
Hi Dave

As a reference point, I installed a fresh CentOS 6.10 VM (which is somewhat less easy than it used to be, since it went EOL 2 days ago and the repos and ISOs have all been moved). I then installed postgresql10-server, ran initdb and started the server. Once that was up and running, I installed pgagent_10-3.4.0. I was able to login to Postgres and run "CREATE EXTENSION pgagent;" without any errors. I then started pgAgent to ensure it ran correctly, which it did.

Then, I ran a "yum upgrade pgagent_10", which installed the 4.0.0 release. In PostgreSQL, I ran "ALTER EXTENSION pgagent UPDATE;", which successfully upgraded the schema. I restarted pgAgent and everything looked good.

So, this doesn't appear to be a general problem, but something specific to your system. Is there anything in the process above that doesn't match with what you've tried to do?

More comments below....

Thanks for your update. I collect logs for your reference.
By the way. Is there any upgrade document which mentioned this scenario ? upgrade pgagent_10 3.4 to pgagent_10 4.0.
5: start pgagent process with new version pgagent.
With what error? Anything in the pgAgent log or /var/log/messages etc?
There is no errors.The pgagent process disappear after execute the command.
On pgagent_10 3.4 version. There is a pgagent_10 process running.

[cid:image020(dot)png(at)01D6CD8D(dot)A032E0A0]

If you don't include the "-f" command line option, pgAgent will detach from the foreground and run as a background process, returning you to a shell prompt. I would suggest running ps to check you don't have a number of pgAgent processes running in the background (and if you do, kill them).

Also note that any command line options *must* come before the connection string, not after it. That would explain why the -s and -l options you've used above don't seem to do anything.

LINE 1: CREATE EXTENSION pgagent UPDATE;
^
postgres=# alter EXTENSION pgagent UPDATE;
ERROR: extension "pgagent" does not exist

That implies that pgAgent isn't installed properly. Are you using the PostgreSQL packages from yum.postgresql.org<http://yum.postgresql.org>? If not, how did you install PostgreSQL? The pgagent_10 package is designed to work with the PostgreSQL 10 RPMs from the same yum.postgresql.org<http://yum.postgresql.org> repo.

徐志宇(Jack)
Database Engineer

DB Team,ITS. Lenovo China
Phone: 86-18910860709
Email:xuzy13(at)lenovo(dot)com<mailto:Email%3Axuzy13(at)lenovo(dot)com>
No.6 Shangdi West Road, Haidian District Beijing, China, 100085

发件人: Dave Page <dpage(at)pgadmin(dot)org<mailto:dpage(at)pgadmin(dot)org>>
发送时间: 2020年12月1日 22:27
收件人: Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>>
抄送: pgadmin-support(at)postgresql(dot)org<mailto:pgadmin-support(at)postgresql(dot)org>
主题: Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons

Hi

On Tue, Dec 1, 2020 at 12:38 PM Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>> wrote:
Hi Dave

The command output is :

My test system. All commands executed on this system.
Before upgrade pgagent. I execute follow in steps.

1: backup and restore production database.
2: install pgagent by yum command .
Yum install pgagent_10
Yum downgrade pgagent_10. (Then pgagent backup to 3.4 )

3: start pgagent process. Make sure pgagent running on normal state.

4: execute yum upgrade command to pgagent 4.0

5: start pgagent process with new version pgagent.
Failed.

With what error? Anything in the pgAgent log or /var/log/messages etc?

6: try to upgrade by the upgrade script.
Failed.

With what error?

7: Build and compile pgagent 4.2.
Try to upgrade pgagent. Failed.

[root(at)slbwcbnos2 ~]# ls -al /usr/pgsql-10/share/extension
total 40
drwxr-xr-x 2 root root 118 Dec 1 14:59 .
drwxr-xr-x 3 root root 23 Dec 1 13:55 ..
-rw-r--r-- 1 root root 538 Jul 12 2018 pgagent--3.4--4.0.sql
-rw-r--r-- 1 root root 27657 Apr 5 2020 pgagent--4.0.sql
-rw-r--r-- 1 root root 139 Apr 5 2020 pgagent.control
-rw-r--r-- 1 root root 2603 Jul 12 2018 pgagent--unpackaged--4.0.sql

That indicates that pgAgent 4.0 is installed, not 4.2. I would suggest not trying to build it yourself at this stage.

Our production system.

[root(at)sltfjfrauxq ~]# ls -al /usr/pgsql-10/share/extension
total 44
drwxr-xr-x 2 root root 4096 Nov 30 23:46 .
drwxr-xr-x 3 root root 4096 Nov 22 2018 ..
-rw-r--r-- 1 root root 27665 Oct 18 2017 pgagent--3.4.sql
-rw-r--r-- 1 root root 139 Oct 18 2017 pgagent.control
-rw-r--r-- 1 root root 2358 Feb 26 2014 pgagent--unpackaged--3.4.sql
[root(at)sltfjfrauxq ~]#

Right - that looks like 3.4 is installed, as expected.

徐志宇(Jack)
Database Engineer

DB Team,ITS. Lenovo China
Phone: 86-18910860709
Email:xuzy13(at)lenovo(dot)com<mailto:Email%3Axuzy13(at)lenovo(dot)com>
No.6 Shangdi West Road, Haidian District Beijing, China, 100085

发件人: Dave Page <dpage(at)pgadmin(dot)org<mailto:dpage(at)pgadmin(dot)org>>
发送时间: 2020年12月1日 18:33
收件人: Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>>
抄送: pgadmin-support(at)postgresql(dot)org<mailto:pgadmin-support(at)postgresql(dot)org>
主题: Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons

Hi

It sounds like the extension hasn't been installed properly at the operating system level - what is the output from:

ls -al /usr/pgsql-10/share/extension

On Tue, Dec 1, 2020 at 10:00 AM Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>> wrote:
Hi Dave

Thanks for your quick response.

I try this command on 4.0 and 4.2. I got a warning that the extension not exist.

[cid:image021(dot)png(at)01D6CD8D(dot)A032E0A0]

[cid:image022(dot)png(at)01D6CD8D(dot)A032E0A0]

徐志宇(Jack)
Database Engineer

DB Team,ITS. Lenovo China
Phone: 86-18910860709
Email:xuzy13(at)lenovo(dot)com<mailto:Email%3Axuzy13(at)lenovo(dot)com>
No.6 Shangdi West Road, Haidian District Beijing, China, 100085

发件人: Dave Page <dpage(at)pgadmin(dot)org<mailto:dpage(at)pgadmin(dot)org>>
发送时间: 2020年12月1日 17:48
收件人: Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>>
抄送: pgadmin-support(at)postgresql(dot)org<mailto:pgadmin-support(at)postgresql(dot)org>
主题: Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons

Hi

On Tue, Dec 1, 2020 at 9:17 AM Zhiyu ZY13 Xu <xuzy13(at)lenovo(dot)com<mailto:xuzy13(at)lenovo(dot)com>> wrote:
Hi Dave

Thanks for your update. I try to upgrade pgagent_10-3.4.0. But failed.
Would you like to help me narrow down this upgrade issue ? Thanks

upgrade path
version upgrade
status
yum install pgagent_10
\i pgagent--3.4--4.0.sql
create extension pgagent;
3.4 – 4.0
failed
compile and build. (install cmake and Boost)

\i pgagent--3.4--4.2.sql
create extension pgagent;
3.4 – 4.2
failed.

Currently pgagent_10 3.4.0 env installed by my teammate. He told me that he install it by yum command and execute a script.
Then those pgagent tables created by this script. There is no extension exist. But those pgagent table exist.

[cid:image023(dot)png(at)01D6CD8D(dot)A032E0A0]

Then I try upgrade pgagent_10 3.40 to 4.0 by yum command. It’s successfully upgrade by yum command.
But the pgagent_10 process unable start.

[root(at)sltfjfrauxq ~]# yum upgrade pgagent_10

Total download size: 138 k
Is this ok [y/N]: y
Downloading Packages:
pgagent_10-4.0.0-4.rhel6.x86_64.rpm | 138 kB 00:08
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Updating : pgagent_10-4.0.0-4.rhel6.x86_64 1/2
Cleanup : pgagent_10-3.4.0-10.rhel6.x86_64 2/2
Verifying : pgagent_10-4.0.0-4.rhel6.x86_64 1/2
Verifying : pgagent_10-3.4.0-10.rhel6.x86_64 2/2

Updated:
pgagent_10.x86_64 0:4.0.0-4.rhel6

Complete!
Version: 4.0.0

[cid:image024(dot)jpg(at)01D6CD8D(dot)A032E0A0]

The 4.0 pgagent start logs.
Tue Dec 1 13:57:53 2020 DEBUG: Creating primary connection
Tue Dec 1 13:57:53 2020 DEBUG: Parsing connection information...
Tue Dec 1 13:57:53 2020 DEBUG: user: postgres
Tue Dec 1 13:57:53 2020 DEBUG: password: *****
Tue Dec 1 13:57:53 2020 DEBUG: dbname: postgres
Tue Dec 1 13:57:53 2020 DEBUG: hostaddr: 127.0.0.1
Tue Dec 1 13:57:53 2020 DEBUG: port: 5432
Tue Dec 1 13:57:53 2020 DEBUG: Creating DB connection: user=postgres password=abcd-1234 hostaddr=127.0.0.1 port=5432 dbname=postgres
Tue Dec 1 13:57:53 2020 DEBUG: Database sanity check

I try to upgrade it by sql. But still on failed.

[root(at)slbwcbnos2 sql]# pwd
/data/postgres/new_package/pgAgent-4.0.0-Source/sql
[root(at)slbwcbnos2 sql]# ls -al
total 36
drwxr-xr-x 2 501 games 90 Dec 1 14:25 .
drwxr-xr-x 6 501 games 298 Jul 12 2018 ..
-rw-r--r-- 1 501 games 538 Jul 12 2018 pgagent--3.4--4.0.sql
-rw-r--r-- 1 501 games 27772 Jul 12 2018 pgagent.sql
-rw-r--r-- 1 501 games 2603 Jul 12 2018 pgagent--unpackaged--4.0.sql

postgres=# \i pgagent--3.4--4.0.sql
Use "CREATE EXTENSION pgagent UPDATE" to load this file.
postgres=# CREATE EXTENSION pgagent UPDATE;
ERROR: syntax error at or near "UPDATE"
LINE 1: CREATE EXTENSION pgagent UPDATE;
^
postgres=# CREATE EXTENSION pgagent UPDATE;
ERROR: syntax error at or near "UPDATE"
LINE 1: CREATE EXTENSION pgagent UPDATE;
^
postgres=# CREATE EXTENSION pgagent ;
ERROR: extension "pgagent" has no installation script nor update path for version "3.4"
postgres=#

After failed. I try to build and compile the latest version 4.2.
Build and install successfully. But I still on unable upgrade current 3.4 version.
I could create extension successfully on a new database.

postgres=# \i pgagent--3.4--4.2.sql
Use "CREATE EXTENSION pgagent UPDATE" to load this file.
postgres=# CREATE EXTENSION pgagent UPDATE;
ERROR: syntax error at or near "UPDATE"
LINE 1: CREATE EXTENSION pgagent UPDATE;
^
postgres=# CREATE EXTENSION pgagent ;
ERROR: relation "pga_jobagent" already exists
postgres=#

[cid:image025(dot)png(at)01D6CD8D(dot)A032E0A0]

I think the message is wrong (which I'll fix if you can confirm) - it should be:

ALTER EXTENSION pgagent UPDATE;

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: http://www.enterprisedb.com

Attachment Content-Type Size
pg_agent_12_7_after_upgrade4_0.log.tar.gz application/x-gzip 6.2 KB

In response to

Responses

Browse pgadmin-support by date

  From Date Subject
Next Message Dave Page 2020-12-08 11:09:40 Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons
Previous Message Junhyung Park 2020-12-08 00:55:07 Re: Installation Error Message

Browse pgsql-bugs by date

  From Date Subject
Next Message Dave Page 2020-12-08 11:09:40 Re: [External] Re: pgadmin--pgagent---the process hang by unknow reasons
Previous Message Andres Freund 2020-12-08 07:35:29 Re: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.