Re: How should we design our tables and indexes

From: veem v <veema0000(at)gmail(dot)com>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: How should we design our tables and indexes
Date: 2024-02-11 07:23:10
Message-ID: CAB+=1TV=bAokrJhM9JjqfLTpemyr3SwbJLTFpXmfggtcw2qKCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you So much Greg.

Will try to test the things as max as possible. I was trying to see
basically, if any obvious things we should take care of before designing a
system for satisfying such requirements. As you pointed few things , i am
trying t answer those below

On Sun, 11 Feb 2024 at 10:43, Greg Sabino Mullane <htamfids(at)gmail(dot)com>
wrote:

> There is a lot to unpack here. I'm going to take a quick pass, but you
> ought to consider getting some custom expert help.
>
> On Sat, Feb 10, 2024 at 2:39 PM veem v <veema0000(at)gmail(dot)com> wrote:
>
>>
>> Pagination is already a hard problem, and does not even make sense when
> combined with "a continuous stream of inserts". What should the user see
> when they click on page 2?
>

When the user clicks to the second page , it will see the next set of rows
i.e 100 to 200 and next will see 200 to 300 and so on till the result set
finishes.

> Also, 500 attributes?! What data types are those? If boolean, you may want
> to look at using bitfields.
>
>> Using LIMIT does allow for this, with certain caveats. The best answer
>> for a lot of these questions is "try it and see" with a simplified version
>> of your queries against some dummy data.
>>
>
>

All those attributes are majorly Varchar and numeric in nature , so not
sure if any options exist there for these?
Additionally, in other databases like Oracle we use hints like
'/*+first_rows*/ to let optimizers favour index paths to favor these types
of UI search queries , so I was wondering if anything like those exists
over here.

> one Index on table1(MID) , one index Table1(CID), one index on
>> table2(ACN_NBR)?
>>
>
> This. One index for each important column. Especially if they vary between
> queries, as you alluded to later.
>
>
If PR_ID is a must in the Join criteria between these table tables table1,
table2 in all the queries, then is it advisable to have a composite index
like (pr_id, mid), (pr_id,cid) etc rather than having index on individual
columns?

> select count(*) over() as total_record, * [large query omitted]
>>
>
> Queries starting with select count(*) are also a red flag, but it's hard
> to tell from this. Probably best to explain what you think the query is
> doing using regular words.
>
>
Actually this inner query is doing the main work, i.e finding the
search results based on the input search criteria. The outer query is just
fetching the results from the inner query along with count(*), to pass on
to the API , so as to calculate and show the user how many pages there
total with a full result set. basically it will count(*)/N records per
page, and that figure will be displayed in the first page of the UI screen.

> 3)One of the use cases is that the customer should be able to search on
>> certain attributes and should be able to see the transactions in "desc by
>> processing_date" i.e. latest transactions on the first page on the UI. And
>> in such scenario, if the search attribute is less unique and the customer
>> puts a date range of a month i.e. over 30 partitions , it may results in
>> scanning and sorting billions of rows to get the top/recent ~100
>> transactions and most likely not going to respond back in <1 sec, even goes
>> for the index. So how should we handle or design indexes for catering such
>> queries? For e.g. if we have the only filter on column "TABLE1.CID" in the
>> above query, which is very less unique then how to handle it?
>>
>
> Well, there is only so much a database can do if you are pulling from 500
> different attributes. But the whole purpose of an indexed column is to
> prevent having to scan *all* the rows. If it's a common value, then
> hopefully there is something else in the where clause more specific that is
> also indexed. As mentioned above, a LIMIT and ORDER BY, with the
> appropriate indexes, can return early.
>
>
So here in case , if there is just the filter on column CID and the join
condition on PR_ID column between the two tables, No other filter exists to
help make the result set smaller. Lets say the query really results in
millions of rows. And it's because the CID column is having very less
distinct values. Also the results are getting ordered on the PR_TIME column
to show the latest transactions in the first page. So indexes on CID won't
be very effective. So is there any option to index any other way or a
different set of columns to make this type of query run faster (not sure if
including the column which is there in order by clause 'pr_time' in the
index will make a difference) ? Or any other way to store the results in a
sorted way already in the table, so as to not do heavy sorting to show the
first 100 rows to the customer out of millions satisfying the search
criteria?

> For e.g. if we set "max_parallel_workers_per_gather"=4 to speed up the
>> queries, then we will be able to serve only 32/4=8 concurrent user requests
>> at any point in time. If we are targeting to serve ~100 concurrent users ,
>> will it be advisable to change or we should test the system with default
>> i.e. not setting this parallel parameter?
>>
>
> Again, this is a TIAS (try it and see), but in general it's not the number
> of concurrent users, but the number of concurrent queries, which is a
> factor of how fast your queries are. Feed Postgres as many parallel workers
> as you can.
>
>
Sure will try to test and see how it behaves when the number of
simultaneous queries (here 32/4=8 concurrent queries) exceed the
max_parallel_workers limit. Though I am expecting the further queries
exceeding the limit might get serialized.

Regards
Veem

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michał Kłeczek 2024-02-11 07:29:29 Re: How should we design our tables and indexes
Previous Message Greg Sabino Mullane 2024-02-11 05:13:09 Re: How should we design our tables and indexes