I have a beefy server (40+ worker processes , 40GB+ shared buffers) and a table holding (key text, text text,) of around 50M rows.
These are text fields extracted from 4-5 page pdfs each.
I’m adding the following generated col to keep up with tsvectors
ALTER TABLE docs_text ADD COLUMN ts tsvector GENERATED ALWAYS AS (to_tsvector(’simple', left(text, 1048575))) STORED
I expect this to be slow, but it’s been running for 18hrs already and I certainly hope I’ve done something wrong and there’s a smarter way.
I thought about incremental updates and/or triggers but a generated col is a cleaner solution.