Re: RFC: Logging plan of the running query

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Étienne BERSAC <etienne(dot)bersac(at)dalibo(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, jtc331(at)gmail(dot)com, rafaelthca(at)gmail(dot)com
Subject: Re: RFC: Logging plan of the running query
Date: 2024-02-13 00:19:03
Message-ID: 1b2b247530f3ff3afab4ddc2df222e8b@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2024-02-07 19:14, torikoshia wrote:
> On 2024-02-07 13:58, Ashutosh Bapat wrote:

>> The prologue refers to a very populated
>> lock hash table. I think that will happen if thousands of tables are
>> queried in a single query OR a query runs on a partitioned table with
>> thousands of partitions. May be we want to try that scenario.
>
> OK, I'll try such cases.

I measured this using partitioned pgbench_accounts with some
modification to v36[1].
The results[2] show that CPU time increases in proportion to the number
of partitions, and the increase is not that large.

However I've noticed that these ensuring no page lock logic would not be
necessary anymore since cc32ec24fdf3b98 removed the assertion which
caused an error[1].

$ git show cc32ec24fdf3b98
..
diff --git a/src/backend/storage/lmgr/lock.c
b/src/backend/storage/lmgr/lock.c
index 0a692ee0a6..f595bce31b 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -186,18 +186,6 @@ static int FastPathLocalUseCount = 0;
*/
static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY =
false;

- /*
- * We don't acquire any other heavyweight lock while holding
the page lock
- * except for relation extension.
- */
- Assert(!IsPageLockHeld ||
- (locktag->locktag_type ==
LOCKTAG_RELATION_EXTEND));

I'm going to remove ensuring no page lock logic after some testings.

[1]
$ git diff _submission/log_running_query-v36
+#include "time.h"
+
bool ProcessLogQueryPlanInterruptActive = false;

/* Hook for plugins to get control in ExplainOneQuery() */
@@ -5258,6 +5260,10 @@ ProcessLogQueryPlanInterrupt(void)
MemoryContext old_cxt;
LogQueryPlanPending = false;

+ clock_t start, end;
+ double cpu_time_used;
+ int num_hash_entry = 0;
+
/* Cannot re-enter. */
if (ProcessLogQueryPlanInterruptActive)
return;
@@ -5287,9 +5293,11 @@ ProcessLogQueryPlanInterrupt(void)
* we check all the LocalLock entries and when finding even one,
give up
* logging the plan.
*/
+ start = clock();
hash_seq_init(&status, GetLockMethodLocalHash());
while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
{
+ num_hash_entry++;
if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
{
ereport(LOG_SERVER_ONLY,
@@ -5301,6 +5309,12 @@ ProcessLogQueryPlanInterrupt(void)
return;
}
}
+ end = clock();
+ cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
+
+ ereport(LOG,
+ errmsg("locallock entry search took: %f for %d entries",
cpu_time_used, num_hash_entry));

[2]
# partition number: 512
locallock entry search took: 0.000029 for 1026 entries
locallock entry search took: 0.000030 for 1026 entries
locallock entry search took: 0.000036 for 1026 entries

# partition number: 1024
locallock entry search took: 0.000070 for 2050 entries
locallock entry search took: 0.000059 for 2050 entries
locallock entry search took: 0.000049 for 2050 entries

# partition number: 2048
locallock entry search took: 0.000100 for 4098 entries
locallock entry search took: 0.000103 for 4098 entries
locallock entry search took: 0.000101 for 4098 entries

# partition number: 4096
locallock entry search took: 0.000197 for 8194 entries
locallock entry search took: 0.000193 for 8194 entries
locallock entry search took: 0.000192 for 8194 entries

[3]
https://www.postgresql.org/message-id/0642712f-1298-960a-a3ba-e256d56040ac%40oss.nttdata.com

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-02-13 01:15:24 RE: Synchronizing slots from primary to standby
Previous Message Andres Freund 2024-02-13 00:11:50 Re: Improve WALRead() to suck data directly from WAL buffers when possible