Re: PostgreSQL and a Catch-22 Issue related to dead rows

From: "MichaelDBA(at)sqlexec(dot)com" <michaeldba(at)sqlexec(dot)com>
To: Lars Aksel Opsahl <Lars(dot)Opsahl(at)nibio(dot)no>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL and a Catch-22 Issue related to dead rows
Date: 2024-12-09 13:18:00
Message-ID: BF20FBB8-EEBA-4318-8732-AFBFA1BCE327@sqlexec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You could always turn off vacuuming at the table level and then resume later

Sent from my iPhone


On Dec 9, 2024, at 6:03 AM, Lars Aksel Opsahl <Lars(dot)Opsahl(at)nibio(dot)no> wrote:





Hi


When processing multiple simple feature layers through PostGIS Topology to perform overlays and eliminate small areas/slivers, we face a complex workflow. To manage the workload, we split the input into smaller jobs using the Content Balanced Grid. These jobs are then executed in parallel using Postgres Execute Parallel.


In one case, we processed a total of 750 cells, with an overall runtime of 40 hours. However, one specific cell took over 12 hours to complete, most of which was spent on removing small areas by deleting edges in PostGIS Topology. The root cause of this delay is related to removal of dead rows.


By introducing periodic COMMIT statements and VACUUM (FULL) operations, we managed to reduce the processing time for that single cell to approximately 3 hours. However, when we scaled this updated code to use 100 parallel threads, we encountered the expected “LWLock | SubtransControlLock” bottleneck, leading to an overall increase in runtime for all the cells.


One issue is the inability to accurately estimate the size of a job before execution, making it difficult to optimize job splitting in advance. Currently, the only viable solution seems to be implementing a timer for each job. The job would terminate after a predefined time, committing its progress and returning control to the caller. The caller must then trigger new jobs to complete the remaining tasks until done.


If there were a way to remove dead rows without requiring a commit from totally unrelated jobs, it would be much easier. The tables in use are unlogged and I already control the vacuum process from the caller. I can't use temp tables. Such a mechanism would also prevent unrelated long running jobs to cause problems for my jobs. To avoid that problem now, I have to move my jobs to a server not used by others also.




Yes there are very good reason for the way removal for dead rows work now, but is there any chance of adding an option when creating table to disable this behavior for instance for unlogged tables ?




Thanks




Lars





In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-12-09 13:35:20 Re: PostgreSQL and a Catch-22 Issue related to dead rows
Previous Message Lars Aksel Opsahl 2024-12-09 11:02:53 PostgreSQL and a Catch-22 Issue related to dead rows