From: | Philip Semanchuk <philip(at)americanefficient(dot)com> |
---|---|
To: | postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Entire index scanned, but only when in SQL function? |
Date: | 2023-07-11 16:07:26 |
Message-ID: | 9D9DBF30-B6EF-49DB-864D-610849C37D41@americanefficient.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi there,
I’m on Postgres 13.11 and I'm seeing a situation where an INSERT...SELECT statement seq scans an index, but only when wrapped in a SQL function. When invoked directly (via psql) or when called via a PL/pgSQL function, it only reads the index tuples it needs, resulting in much better performance. I can solve my problem by writing the function in PL/pgSQL, but I'm curious why the pure SQL version behaves the way it does.
Here's my table --
\d documents
+-------------------+------------------+----------------------------------------+
| Column | Type | Modifiers |
|-------------------+------------------+----------------------------------------|
| document_id | integer | not null generated always as identity |
| product_id | integer | not null |
| units_sold | integer | not null |
| sale_date | date | not null |
... some other columns ...
+-------------------+------------------+----------------------------------------+
CREATE INDEX idx_philip_tmp on documents (document_id, product_id);
Here's the SQL function which will use that index --
CREATE OR REPLACE FUNCTION fn_create_tasks(product_ids int[])
RETURNS void
AS $$
-- Create processing tasks for documents related to these products
INSERT INTO
processing_queue (document_id)
SELECT
DISTINCT document_id
FROM
documents
JOIN unnest(product_ids::int[]) AS product_id USING (product_id)
;
$$ LANGUAGE sql VOLATILE PARALLEL SAFE;
96498 is a product_id that has one associated document_id. When I copy/paste this statement into psql, it executes quickly, and pg_stat_user_indexes.idx_tup_read reports 2 tuples read for the index.
INSERT INTO
processing_queue (document_id)
SELECT
DISTINCT document_id
FROM
documents
JOIN unnest(ARRAY[96498]::int[]) AS product_id USING (product_id)
;
When I copy/paste this into psql, I expect it to perform just as quickly but it does not. pg_stat_user_indexes.idx_tup_read reports 64313783 tuples read (which is the entire index).
SELECT fn_create_tasks(ARRAY[96498]::int[])
If I rewrite fn_create_tasks() in PL/pgSQL, it behaves as I expect (executes quickly, pg_stat_user_indexes.idx_tup_read = 2).
SELECT fn_create_tasks_plpgsql(ARRAY[96498]::int[])
My rule of thumb is that SQL functions always perform as well as or better than a PL/pgSQL equivalent, but this is a case where that's not true. If anyone can give me some clues as to what's happening here, I'd appreciate it.
Thanks
Philip
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-07-11 17:12:45 | Re: Performance implications of 8K pread()s |
Previous Message | Dimitrios Apostolou | 2023-07-10 14:28:51 | Performance implications of 8K pread()s |