Optimizing search query with sorting by creation field

From: Droid Tools <droidtools(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Optimizing search query with sorting by creation field
Date: 2021-05-07 00:42:33
Message-ID: CABbQ-pUrdq9+Rh8myo=JEze3Ezdq9Tj60knAFxZ59Aqs0WrWGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I'm looking for tips on optimizing a search query where someone searches
for content within a post and wants to sort the results by a timestamp. I
have a table `posts` with a `created` field (timestamp) and a post_tsv
column (TSVECTOR). Likewise I have a GIN Index on the `post_tsv` field and
a separate index on the `created` field.

My initial, naive, attempt was to simply to do something like:

```
SELECT * FROM posts WHERE post_tsv @@ websearch_to_tsquery(?) ORDER BY
created DESC
```

However, I didn't realize in this case the `created` index would be
ignored, which means if there was a large number of posts returned, this
query would take several seconds to execute. Also the planner would do
weird things even if the result set was small and still take several
seconds to execute. Currently I've papered over the problem by issuing a
subquery and sorting that instead.

```
SELECT * FROM
(SELECT * FROM posts WHERE post_tsv @@ websearch_to_tsquery(?) LIMIT
10,000) q
ORDER BY created DESC
```

In short I execute the search, limit that to 10,000 rows, and then order
the 10,000 rows that were returned. This worked amazingly for queries that
returned fewer than 10,000 rows as those queries went from taking several
seconds to run down to a handful of milliseconds. The problem is for
queries with more than 10,000 rows you essentially end up with random
results. I'm still not using the created index, but sorting 10,000 rows in
memory is relatively fast.

I'm stuck where to go from here - what I would like, since I know I will
only ever ORDER BY the created field is to build some index where the
default ordering is by the created field. GIN, as I understand it, doesn't
support indexing in this manner (using one of the columns as a sort field).
Is there anything else I could try?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Atul Kumar 2021-05-07 06:34:44 idle_in_transaction_session_timeout
Previous Message Adrien Nayrat 2021-05-06 19:21:18 Re: "invalid contrecord" error on replica