Re: Large expressions in indexes can't be stored (non-TOASTable)

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Large expressions in indexes can't be stored (non-TOASTable)
Date: 2024-09-04 19:20:33
Message-ID: c500a379-7a20-4ab4-bcc9-5ea9aabe017a@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/4/24 3:08 PM, Tom Lane wrote:
> Nathan Bossart <nathandbossart(at)gmail(dot)com> writes:
>> Thanks to commit 96cdeae, only a few catalogs remain that are missing TOAST
>> tables: pg_attribute, pg_class, pg_index, pg_largeobject, and
>> pg_largeobject_metadata. I've attached a short patch to add one for
>> pg_index, which resolves the issue cited here. This passes "check-world"
>> and didn't fail for a few ad hoc tests (e.g., VACUUM FULL on pg_index). I
>> haven't spent too much time investigating possible circularity issues, but
>> I'll note that none of the system indexes presently use the indexprs and
>> indpred columns.
>
> Yeah, the possibility of circularity seems like the main hazard, but
> I agree it's unlikely that the entries for system indexes could ever
> need out-of-line storage. There are many other things that would have
> to be improved before a system index could use indexprs or indpred.

Agreed on the unlikeliness of that, certainly in the short-to-mid term.
The impetus driving this is dealing with a data type that can be quite
large, and it's unlikely system catalogs will be dealing with anything
of that nature, or requiring very long expressions that couldn't be
encapsulated in a different way.

Just to be fair, in the case I presented there's an argument that what
I'm trying to do is fairly inefficient for an expression, given I'm
passing around an additional several KB payload into the query. However,
we'd likely have to do that anyway for this problem space, and the
overall performance hit is negligible compared to the search relevancy
boost.

I'm working on a much more robust test, but using a known 10MM 768-dim
dataset and two C-based quantization functions (one using the
expression), I got a 3% relevancy boost with a 2% reduction in latency
and throughput. On some other known datasets, I was able to improve
relevancy 40% or more, though given they were initially returning with
0% relevancy in some cases, it's not fair to compare performance numbers.

There are other ways to solve the problem as well, but allowing for the
larger expression gives users more choices in how they can approach it.

Jonathan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-09-04 19:57:20 Re: race condition in pg_class
Previous Message Tom Lane 2024-09-04 19:08:13 Re: Large expressions in indexes can't be stored (non-TOASTable)