Re: How should we design our tables and indexes

From: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
To: veem v <veema0000(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: How should we design our tables and indexes
Date: 2024-02-11 05:13:09
Message-ID: CAKAnmmL=x_3Ga_aBa1UQfHjcCikV7SJX5iM9n4b0bb76cgZVyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

There is a lot to unpack here. I'm going to take a quick pass, but you
ought to consider getting some custom expert help.

On Sat, Feb 10, 2024 at 2:39 PM veem v <veema0000(at)gmail(dot)com> wrote:

> ... These will be pagination queries. These read queries will be on big
> transaction tables (will have ~500+ attributes approx will have approx.
> rows size of ~1KB) having a continuous stream of inserts
>

Pagination is already a hard problem, and does not even make sense when
combined with "a continuous stream of inserts". What should the user see
when they click on page 2?

Also, 500 attributes?! What data types are those? If boolean, you may want
to look at using bitfields.

1)Is there any way in postgres to influence the optimizer for the
> "first_row" optimization, so that it won't go for evaluating all the rows
> from such UI search queries. As because these will be pagination queries
> and the user will be interested in seeing top 100 transactions in the first
> page asap?
>

Using LIMIT does allow for this, with certain caveats. The best answer for
a lot of these questions is "try it and see" with a simplified version of
your queries against some dummy data.

> one Index on table1(MID) , one index Table1(CID), one index on
> table2(ACN_NBR)?
>

This. One index for each important column. Especially if they vary between
queries, as you alluded to later.

select count(*) over() as total_record, * [large query omitted]
>

Queries starting with select count(*) are also a red flag, but it's hard to
tell from this. Probably best to explain what you think the query is doing
using regular words.

3)One of the use cases is that the customer should be able to search on
> certain attributes and should be able to see the transactions in "desc by
> processing_date" i.e. latest transactions on the first page on the UI. And
> in such scenario, if the search attribute is less unique and the customer
> puts a date range of a month i.e. over 30 partitions , it may results in
> scanning and sorting billions of rows to get the top/recent ~100
> transactions and most likely not going to respond back in <1 sec, even goes
> for the index. So how should we handle or design indexes for catering such
> queries? For e.g. if we have the only filter on column "TABLE1.CID" in the
> above query, which is very less unique then how to handle it?
>

Well, there is only so much a database can do if you are pulling from 500
different attributes. But the whole purpose of an indexed column is to
prevent having to scan *all* the rows. If it's a common value, then
hopefully there is something else in the where clause more specific that is
also indexed. As mentioned above, a LIMIT and ORDER BY, with the
appropriate indexes, can return early.

For e.g. if we set "max_parallel_workers_per_gather"=4 to speed up the
> queries, then we will be able to serve only 32/4=8 concurrent user requests
> at any point in time. If we are targeting to serve ~100 concurrent users ,
> will it be advisable to change or we should test the system with default
> i.e. not setting this parallel parameter?
>

Again, this is a TIAS (try it and see), but in general it's not the number
of concurrent users, but the number of concurrent queries, which is a
factor of how fast your queries are. Feed Postgres as many parallel workers
as you can.

tl;dr Less than 1 second response time is *probably* reachable given your
parameters, but parts of it are too vague to state for certain.

Cheers,
Greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message veem v 2024-02-11 07:23:10 Re: How should we design our tables and indexes
Previous Message Peter J. Holzer 2024-02-11 00:25:28 Re: How to do faster DML