Quick Links

Increasing IndexTupleData.t_info from uint16 to uint32

From:	Montana Low <montana(at)postgresml(dot)org>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Increasing IndexTupleData.t_info from uint16 to uint32
Date:	2024-01-18 05:10:05
Message-ID:	CAAjvh2Q6MVRip0AJuWe0TyHjhujmpJHFAmWtXZMefCVnZDJ17w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

The overall trend in machine learning embedding sizes has been growing
rapidly over the last few years from 128 up to 4K dimensions yielding
additional value and quality improvements. It's not clear when this trend
in growth will ease. The leading text embedding models
<https://d2Vt1P04.na1.hs-sales-engage.com/Ctc/I9+23284/d2Vt1P04/Jl22-6qcW7lCdLW6lZ3mZW2xQSTk7n4W72W13GRJl8P4L5lW65mzGl6rb-LDW3fZkLK7wL6F7W7bLKRl3_KR8gW2H6rqc3sKVWHW4D9qCR2clFY9W7HsFTL26WFWlW5Hfcrf2QPpJMW3LsmXC7QBywcW6f_40K6rHypwN72RsbhFLs2vW6jlt_y6pFdc9V91HDm4pT7BnVccdx84tkPBQW6PJxqG2F_FLmV6W6fc5JT11jW8vC6FB5DCKGjW1854vH3kDmt-W9lxPZm4_rYkDW4L32gg19WxLrW6-_S_-5MBHNYW2MsMBv25m7NkW5tPMCP5x2DzRf7CCRKR04>
generate
now exceeds the index storage available in IndexTupleData.t_info.

The current index tuple size is stored in 13 bits of IndexTupleData.t_info,
which limits the max size of an index tuple to 2^13 = 8129 bytes. Vectors
implemented by pgvector
<https://d2Vt1P04.na1.hs-sales-engage.com/Ctc/I9+23284/d2Vt1P04/JkM2-6qcW6N1vHY6lZ3nBW1KW3_33qLHXZW5SdJZV6V1sGTW4c2GQ_3MLxkdW2lzQbs2W87JKW772TLX7BpFlQW8-WNlD7GgH2tW3yzJG98NPhgFW3QMP2h5CKxzKN4DD1QlzH6WrW1ByHLF3QYtPQW1W8HLB2Jl6vZW8C8pKB9fvQMtW7wJpwd3-8fwWW60mRbF435_NkW253WL721Q95QW20Z-xk3_22C0W1Thshf6-_qGbW6rz9tX72gbKyW5L9ktk1Vtn-dW8601Jv3ZfHxhW7ZW-6L86RX2ZW293jnQ921NT6f2Kg23K04>
currently use
a 32 bit float for elements, which limits vector size to 2K
dimensions, which is no longer state of the art.

I've attached a patch that increases IndexTupleData.t_info from 16bits to
32bits allowing for significantly larger index tuple sizes. I would guess
this patch is not a complete implementation that allows for migration from
previous versions, but it does compile and initdb succeeds. I'd be happy to
continue work if the core team is receptive to an update in this area, and
I'd appreciate any feedback the community has on the approach.

I imagine it might be worth hiding this change behind a compile time
configuration parameter similar to blocksize. I'm sure there are
implications I'm unaware of with this change, but I wanted to start the
discussion around a bit of code to see how much would actually need to
change.

Also, I believe this is my first mailing list post in a decade or 2, so let
me know if I've missed something important. BTW, thanks for all your work
over the decades!

Attachment	Content-Type	Size
32bit_index_info.patch	application/octet-stream	2.9 KB

Responses

Re: Increasing IndexTupleData.t_info from uint16 to uint32 at 2024-01-18 15:46:46 from Tom Lane
Re: Increasing IndexTupleData.t_info from uint16 to uint32 at 2024-01-18 16:22:24 from Matthias van de Meent

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andy Fan	2024-01-18 05:10:57	Re: Strange Bitmapset manipulation in DiscreteKnapsack()
Previous Message	Peter Smith	2024-01-18 05:01:00	Re: Synchronizing slots from primary to standby