Re: Background worker assistance & review

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Keith Fiske <keith(at)omniti(dot)com>
Cc: PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Background worker assistance & review
Date: 2015-04-10 03:56:07
Message-ID: CAMsr+YHNfU7f+F6Rcvtwjqmb0EefYTXOuzQ3WQnwCa_dXavS5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 9 April 2015 at 05:35, Keith Fiske <keith(at)omniti(dot)com> wrote:

> I'm working on a background worker (BGW) for my pg_partman extension. I've
> gotten the basics of it working for my first round, but there's two
> features I'm missing that I'd like to add before release:
>
> 1) Only allow one instance of this BGW to run
>

Load your extension in shared_preload_libraries, so that _PG_init runs in
the postmaster. Register a static background worker then.

If you need one worker per database (because it needs to access the DB)
this won't work for you, though. What we do in BDR is have a single static
background worker that's launched by the postmaster, which then launches
and terminates per-database workers that do the "real work".

Because of a limitation in the bgworker API in releases 9.4 and older, the
static worker has to connect to a database if it wants to access shared
catalogs like pg_database. This limitation has been lifted in 9.5 though,
along with the need to use the database name instead of its oid to connect
(which left bgworkers unable to handle RENAME DATABASE).

(We still really need a hook on CREATE DATABASE too)

2) Create a bgw_terminate_partman() function to stop it more intuitively
> than doing a pg_cancel_backend() on the PID
>

If you want it to be able to be started/stopped dynamically, you should
probably use RequestAddinShmemSpace to allocate a small shared memory
block. Use that to register the PGPROC for the current worker when the
worker starts, and add a boolean field you can use to ask it to terminate
its self. You'll also need a LWLock to protect access to the segment, so
you don't have races between a worker starting and the user asking to
cancel it, etc.

Unfortunately the BackgroundWorkerHandle struct is opaque, so you cannot
store it in shared memory when it's returned by
RegisterDynamicBackgroundWorker() and use it to later check the worker's
status or ask it to exit. You have to use regular backend manipulation
functions and PGPROC instead.

Personally, I suggest that you leave the worker as a static worker, and
leave it always running when the extension is active. If it isn't doing
anything, have it sleep on its latch, then set its latch from other
processes when something interesting happens. (You can put the process
latch from PGPROC into your shmem segment so you can set it from elsewhere,
or allocate a new latch).

This is my first venture into writing C code for postgres, so I'm not
> familiar with a lot of the internals yet. I read
> http://www.postgresql.org/docs/9.4/static/bgworker.html and I see it
> mentioning how you can check the status of a BGW launched dynamically and
> the function to terminate one, but I'm not clear how how you can get the
> information on a currently running BGW to do these things.
>

You can't. It's a pretty significant limitation in the current API. There's
no way to enumerate bgworkers via the bgworker API, only via PGPROC.

> I used the worker_spi example for a lot of this, so if there's any
> additional guidance for a better way to do what I've done, I'd appreciate
> it. All I really have it doing now is calling the run_maintenance()
> function at a defined interval and don't need it doing more than that yet.
> <http://www.keithf4.com>
>

The BDR project has an extension with much more in-depth use of background
workers, but it's probably *too* complicated. We have a static bgworker
that launches and terminates dynamic bgworkers (per-database) that in turn
launch and terminate more dynamic background workers (per-connection to
peer databases).

If you're interested, all the code is mirrored on github:

https://github.com/2ndquadrant/bdr/tree/bdr-plugin/next

and the relevant parts are:

https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.c#L640
https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_perdb.c
https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_supervisor.c
https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_shmem.c
https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_apply.c#L2401
https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.h

... but there's a *lot* of code there.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Deole, Pushkar (Pushkar) 2015-04-10 06:54:52 Re: Regarding bytea column in Posgresql
Previous Message Volkan Unsal 2015-04-10 03:43:11 Re: no pg_hba.conf entry for replication connection from host