Quick Links

PostgreSQL and a Catch-22 Issue related to dead rows

From:	Lars Aksel Opsahl <Lars(dot)Opsahl(at)nibio(dot)no>
To:	"pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	PostgreSQL and a Catch-22 Issue related to dead rows
Date:	2024-12-09 11:02:53
Message-ID:	AM7P189MB10281DCA1FD196A5610484B29D3C2@AM7P189MB1028.EURP189.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi
When processing multiple simple feature layers through PostGIS Topology to perform overlays and eliminate small areas/slivers, we face a complex workflow. To manage the workload, we split the input into smaller jobs using the Content Balanced Grid<https://gitlab.com/nibioopensource/content_balanced_grid>. These jobs are then executed in parallel using Postgres Execute Parallel<https://gitlab.com/nibioopensource/postgres_execute_parallel>.
In one case, we processed a total of 750 cells, with an overall runtime of 40 hours. However, one specific cell took over 12 hours to complete, most of which was spent on removing small areas by deleting edges in PostGIS Topology. The root cause of this delay is related to removal of dead rows.
By introducing periodic COMMIT statements and VACUUM (FULL) operations, we managed to reduce the processing time for that single cell to approximately 3 hours. However, when we scaled this updated code to use 100 parallel threads, we encountered the expected “LWLock | SubtransControlLock” bottleneck, leading to an overall increase in runtime for all the cells.
One issue is the inability to accurately estimate the size of a job before execution, making it difficult to optimize job splitting in advance. Currently, the only viable solution seems to be implementing a timer for each job. The job would terminate after a predefined time, committing its progress and returning control to the caller. The caller must then trigger new jobs to complete the remaining tasks until done.
If there were a way to remove dead rows without requiring a commit from totally unrelated jobs, it would be much easier. The tables in use are unlogged and I already control the vacuum process from the caller. I can't use temp tables. Such a mechanism would also prevent unrelated long running jobs to cause problems for my jobs. To avoid that problem now, I have to move my jobs to a server not used by others also.

Yes there are very good reason for the way removal for dead rows work now, but is there any chance of adding an option when creating table to disable this behavior for instance for unlogged tables ?

Thanks

Lars

Responses

Re: PostgreSQL and a Catch-22 Issue related to dead rows at 2024-12-09 13:18:00 from MichaelDBA@sqlexec.com
Re: PostgreSQL and a Catch-22 Issue related to dead rows at 2024-12-09 13:35:20 from Greg Sabino Mullane
Re: PostgreSQL and a Catch-22 Issue related to dead rows at 2024-12-09 14:25:45 from Rick Otten
Re: PostgreSQL and a Catch-22 Issue related to dead rows at 2024-12-09 14:32:59 from Christophe Pettus

Browse pgsql-performance by date

	From	Date	Subject
Next Message	MichaelDBA@sqlexec.com	2024-12-09 13:18:00	Re: PostgreSQL and a Catch-22 Issue related to dead rows
Previous Message	jian he	2024-12-09 06:16:04	Re: proposal: schema variables