Re: Optimising Foreign Key checks

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimising Foreign Key checks
Date: 2013-06-02 09:45:21
Message-ID: CA+U5nML6BwWtVcfkg3nzr=k7fe8dx7VGKRM8Npk8NjsrYbatcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 June 2013 21:27, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Jun 01, 2013 at 09:41:13AM +0100, Simon Riggs wrote:
>> FK checks can be expensive, especially when loading large volumes of
>> data into an existing table or partition. A couple of ideas for
>> improving performance are discussed here:
>>
>> 1. Use Case: Bulk loading
>> COPY pgbench_accounts; --> references pgbench_branches with many
>> repeated values
>>
>> Proposal: Transactions that need multiple checks can be optimised by
>> simply LOCKing the whole referenced table, once. We can then hold the
>> referenced table as a Hash, like we do with a Hash Join (its almost
>> exactly the same thing). This works in two ways: it speeds up checks
>> and it also reduces the locking overhead.
>
>> 2. Use Case: Transactional repetition
>> BEGIN;
>> INSERT INTO order VALUES (ordid, ....)
>> INSERT INTO order_line VALUES (ordid, 1, .....)
>> INSERT INTO order_line VALUES (ordid, 2, .....)
>> INSERT INTO order_line VALUES (ordid, 3, .....)
>> INSERT INTO order_line VALUES (ordid, 4, .....)
>
>> Proposal: De-duplicate multiple checks against same value. This would
>> be implemented by keeping a hash of rows that we had already either
>> inserted and/or locked as the transaction progresses, so we can use
>> the hash to avoid queuing up after triggers.
>
> I find (2) more promising than (1). It helps automatically, and it helps in
> more cases. The main advantage of (1) is avoiding the writes of tuple locks
> onto the PK table. Therefore, I expect (1) to distinguish itself over (2)
> when the referenced values are _not_ repeated too much. If referenced values
> are repeated, tuple locking costs would tend to disappear into the noise after
> the deduplication of (2).
>
>> In both cases we are building up a local hash table with values and
>> then using those values to avoid queuing constraint triggers. So code
>> is similar for both.
>>
>> Thoughts?

Thanks for your detailed reply.

> Will this add too much cost where it doesn't help? I don't know what to
> predict there. There's the obvious case of trivial transactions with no more
> than one referential integrity check per FK, but there's also the case of a
> transaction with many FK checks all searching different keys. If the hash hit
> rate (key duplication rate) is low, the hash can consume considerably more
> memory than the trigger queue without preventing many RI queries. What sort
> of heuristic could we use to avoid pessimizing such cases?

I've struggled with that for a while now. Probably all we can say is
that there might be one, and if there is not, then manual decoration
of the transaction will be the way to go.

> Same-transaction UPDATE or DELETE of the PK table, as well as subtransaction
> abort, potentially invalidates hash entries. I recommend thinking relatively
> early about how best to handle that.

Thanks, good point.

> Before deciding what to think overall, I needed to see a benchmark. I ran
> this simple one based on your scenarios:
>
> BEGIN;
> CREATE TABLE "order" (orderid int PRIMARY KEY);
> CREATE TABLE order_line (
> orderid int,
> lineno int,
> PRIMARY KEY (orderid, lineno),
> FOREIGN KEY (orderid) REFERENCES "order"
> );
> INSERT INTO "order" VALUES (1);
> INSERT INTO order_line SELECT 1, n FROM generate_series(1,1000000) t(n);
> ROLLBACK;
>
> See attached output from "perf report -s parent -g graph,5,caller"; I suggest
> browsing under "less -S". It confirms that the expensive part is something
> your proposals would address.

Thanks for posting this. It shows nicely that there is a problem with
repeated execution of SQL in constraint triggers. However, that isn't
the only problem since we must also consider memory usage and memory
scrolling. Currently, the after trigger queue builds up in memory and
doesn't scroll to disk, so overall memory usage is a problem in many
cases. Memory scrolling is also a problem since the after trigger
queue records the tids against which we we need to execute triggers;
this causes us to re-access blocks that had scrolled out of memory,
thus defeating the bulk access strategy, as well as spoiling cache.

For clarity the 4 problems are
1. SQL execution overhead
2. Memory usage
3. Memory scrolling
4. Locking overhead, specifically FPWs and WAL records from FK checks
probably in that order or thereabouts.

The above is why I went for a technique that avoided SQL execution
entirely, as well as conserving memory by de-duplicating the values in
a hash table as we go, which avoids all three problems. The fourth was
solved by the more extreme approach to locking.

> A different way to help the bulk loading case would be to lock more keys with
> a single query. Currently, we issue a query like this for every INSERTed row:
>
> SELECT 1 FROM ONLY pktable WHERE pkcol = $1 FOR KEY SHARE OF x
>
> We could instead consider queued tuples in batches:
>
> SELECT 1 FROM ONLY pktable WHERE pkcol = ANY (ARRAY[$1,$2,...,$100]) FOR KEY SHARE OF x
>
> This would have the advantage of not adding a long-lived, potentially-large
> data structure and not depending on the rate of referenced key duplication.
> But it would not help non-bulk scenarios. (However, the user could make your
> example for (2) become a bulk scenario by deferring the FK constraint.)

At the moment we queue the after triggers first, then execute later.
So we can run out of memory before we execute any SQL. Which means we
need to do something to prevent the after trigger queue growing,
rather than simply optimise the way we execute things on the queue.

In terms of formal trigger behaviour, I am suggesting we redefine FK
triggers as if they were executing a WHEN clause that looks like this
WHEN check_not_already_queued()
and where the function would have the side effect of maintaining an
in-memory hash that grows up to a certain size. I suggest work_mem for
most statements, but maintenance_work_mem for COPY.

Thus the WHEN clause gets executed before we queue anything and so we
are able to minimise problems 2 and 3.

Having that formal definition also puts us neatly into an
implementation route in that we are simply adding a WHEN clause to FK
triggers.

The overall execution plan then looks a lot like a hash join, with
skew avoidance. We build up a hash table in memory, up to a certain
size, while letting non-matched rows spill to the after trigger queue
for later execution. Since we skip many of the checks via the WHEN
clause, the after trigger queue should be more manageable in size.

I think it might be worth considering joining the after trigger queue
directly to the referenced table(s), something like this...

CREATE OR REPLACE FUNCTION after_trigger_queue() RETURNS SETOF tid AS $$
...
$$ LANGUAGE SQL;

EXPLAIN
SELECT 1 FROM ONLY "order"
WHERE orderid IN
(SELECT orderid FROM ONLY order_line WHERE ctid IN (SELECT
after_trigger_queue FROM after_trigger_queue() ))
FOR KEY SHARE;

But we could optimise that even further if we had a "BlockScan", which
would be a block-oriented version of the tid scan where we simply
provide a bitmap of blocks needing to be scanned, just like the output
of an BitmapIndexScan. The reason for mentioning that here is that
parallel query will eventually need the ability to do a scan of a
subset of blocks, as does tablesample. So I can see 3 callers of such
a Scan type.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-06-02 10:39:42 Re: Deferring transaction wraparound
Previous Message Soroosh Sardari 2013-06-02 07:35:58 Re: Which table stored in which file in PGDATA/base/[db-oid]