Re: Index for range queries on JSON (user defined fields)

From: Marco Colli <collimarco91(at)gmail(dot)com>
To: Nick Cleaton <nick(at)cleaton(dot)net>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Index for range queries on JSON (user defined fields)
Date: 2020-12-05 10:50:09
Message-ID: CAFvCgN4fZc4ZGw5z-y_xbj9sJrK9nMNgzDiUkXBDQQuunZgstQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for the suggestion: I had already considered that solution (first
link), but the fear is having to JOIN large tables with hundreds of
millions of records.

For my understanding **using JOIN when dealing with big data is bad and a
nightmare for performance**: can you confirm? Or am I missing something?

That tables would be frequently read and updated and are the core of the
application: that also means that every update on a user would produce
**many dead rows** - not just 1 user row, as in the case of JSON, but many
rows in the user metadata table.

On Fri, Dec 4, 2020 at 11:40 PM Nick Cleaton <nick(at)cleaton(dot)net> wrote:

> On Fri, 4 Dec 2020 at 15:39, Marco Colli <collimarco91(at)gmail(dot)com> wrote:
>
>> Hello!
>>
>> We have a multi-tenant service where each customer has millions of users
>> (total: ~150M rows). Now we would like to let each customer define some
>> custom columns for his users and then let the customer search his users
>> efficiently based on these columns.
>>
>> This problem seems really hard to solve with PostgreSQL:
>>
>> https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields
>>
>> In particular the easiest way would be to add a JSON field on the users
>> table (e.g. user metadata). However the PostgreSQL GIN index only supports
>> exact matches and not range queries. This means that a query on a range
>> (e.g. age > 30) would be extremely inefficient and would result in a table
>> scan.
>>
>
> You could have a table of (tenant, customer, setting_name, setting_value)
> so that a btree index on (tenant, setting_name, setting_value) would work
> for "select customer from my_table where tenant=$1 and setting_name='age'
> and setting_value > 30"
>
> That doesn't deal with setting values having a variety of types, but you
> could have a distinct user defined settings table for each setting value
> type that you want to support.
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Atul Kumar 2020-12-11 08:12:40 "Required checkpoints occurs too frequently"
Previous Message Nick Cleaton 2020-12-04 22:39:45 Re: Index for range queries on JSON (user defined fields)