Question: consolidating strpos searches?

From: James Addison <jay(at)jp-hosting(dot)net>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Question: consolidating strpos searches?
Date: 2025-01-04 17:16:06
Message-ID: CALDQ5Ny-+ZxLuJiqC11EafO11F-KVSBgWnrWqWpRt4bw1L+hhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello and Happy New Year,

I'd like to validate or reject a performance-related PostgreSQL patch
idea, regarding multiple strpos calls that operate on a common text
input.

From local testing using PGSQL 17.2, and searching the mailing lists,
my understanding is that each function expression (including any that
are duplicates) in a SQL query is evaluated independently.

To check that, I constructed a pessimal demonstration (psql session
snippet attached here for reference) by generating an extremely long
random text blob ending with the phrase "known suffix".

Running queries against that text locally I encountered the runtimes
(annotated in the SQL comments) below:

WHERE strpos(v, 'known') > 0; -- 2 seconds
WHERE strpos(v, 'known') > 0 AND strpos(v, 'suffix') > 0; -- 4 seconds
WHERE strpos(v, 'known') > 0 AND strpos(v, 'suffix') > 0 AND
strpos(v, 'suffix') > 0; -- 6 seconds
WHERE strpos(v, 'known') > 0 AND strpos(v, 'suffix') > 0 AND
strpos(v, 'suffix') > 0 AND strpos(v, 'suffix') > 0; -- 8 seconds

In other words: each additional strpos(value, ...) expression
increased the evaluation time by a similar, significant duration of
two seconds. This seems to confirm the basis that each expression is
currently evaluated separately, by an independent read from the input
text.

I'd like to suggest the introduction of a documented multiple string
matching algorithm[1], to yield results for each of multiple strpos
calls while only reading through their common input text at-most once.

In the context of considering writing a patch: would the complexity of
implementing such a feature for PostgreSQL be worth the potential
performance benefits? And either way, is there more I should learn
about and consider? How would I provide convincing supporting
evidence if I do write a patch?

Thank you,
James

[1] Such as those described in "Flexible Pattern Matching in Strings",
authored by G Navarro and M Raffinot, 2002, published by Cambridge
University Press

Attachment Content-Type Size
incremental_strpos_cost.txt text/plain 3.3 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2025-01-04 17:45:11 Re: Question: consolidating strpos searches?
Previous Message Pavel Stehule 2025-01-04 05:36:56 Re: Re: proposal: schema variables