JSON vs Text + Regexp Index Searching

From: Eliot Gable <egable+pgsql-general(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: JSON vs Text + Regexp Index Searching
Date: 2014-02-25 15:08:59
Message-ID: CAD-6L_WZpd2oEPWK6jOzZYnxwHwytziFoSuwnxu7=ODiJ=+j3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am working on a project for which I require some assistance. Any input
would be appreciated.

We have a table with millions of records and dozens of columns. On some
systems, it takes up >10GB of disk space, and it runs on a single disk
which is rather slow (7200 RPM). We typically don't do any complicated
queries on it, so performance has not posed a problem previously. The table
is actually defined as 3 tables using inheritance. One table shows only the
last 7 days of data, and then a process moves the 8th day into the history
table for archiving. For performance reasons, we typically only query the
table with 7 days of data, which generally does not exceed 1 million rows.
In fact, it is typically under 50k rows for most systems. However, a few
systems may contain a few million records, at most. There are indices on
each of the fields we query, and we also put limits on the number of
records returned. The table is practically an append-only table.

We are looking at adding some additional columns to the table, and planning
for some future features. One of those features will allow cross
referencing the records in this table with a list of IDs. If the record's
list of IDs contains the one we are looking for, we want to include the
record in our query. I advocated creating a separate mapping table which
maps the ID of these records to the other ID we are searching for and
performing a JOIN on the two tables with appropriate foreign key
relationships and indices. However, I was ask to instead put the list into
a single column on each row to reduce implementation complexity.

Assuming the list of IDs is in a column on each row as TEXT in the format
of a JSON array, what is the best way to index the column so I can quickly
find the rows with the given ID?

Limitations:

We run version 9.0.x on these systems right now. We plan on updating later
this year, but for now we don't have the JSON type. Once we do, we plan to
make use of it. The column value will also be passed to the UI, which is
JavaScript-based, so a JSON format would be ideal.

We cannot modify the disks we run on to make them faster. It is cost
prohibitive in our particular scenario.

The systems in question are real-time systems, and are carefully optimized.
Any excessively lengthy queries which have high disk IO or CPU usage will
adversely affect the entire system.

My Thoughts:

Is it possible to use a REGEXP INDEX in this case? For example: column ~
'^\[67|,67,|,67\]$'
Will such an index perform well? How would I create it?

How badly would a simple LIKE operation perform? Is there any way to
optimize it?

Would using a JSON type in the future allow us to do any better type of
index on the column?

If we wanted to put an index on a JSON data type column whose values was a
simple array of IDs, what would the index look like, and how would we
construct the query to make use of it?

Thanks in advance for any input / insight on this.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2014-02-25 16:07:17 Re: JSON vs Text + Regexp Index Searching
Previous Message Pavel Stehule 2014-02-25 12:38:18 Re: execute query from inside C function