| From: | Marcin Barczynski <mbarczynski(at)starfishstorage(dot)com> |
|---|---|
| To: | pgsql-performance(at)postgresql(dot)org |
| Subject: | Slow vacuum of GIST indexes, because of random reads on PostgreSQL 9.6 |
| Date: | 2017-09-06 11:57:25 |
| Message-ID: | CAOhG4wem20oFCyZhGW-WyqJcno45z2trPnvsDrDUcPbOytFycg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
I am using a GIST index on timestamp range, because it supports 'contains'
operator ('@>'). Unfortunately, in large scale (billions of rows, index
size: almost 800 GB) vacuuming the index takes an order of magnitude longer
than btrees (days/weeks instead of hours).
According to the code, during vacuum gist index is traversed in a logical
order which translates into random disk acceses (function gistbulkdelete in
gistvacuum.c). Btree indexes are vacuummed in physical order (function
btvacuumscan in nbtree.c).
As a workaround, I'm planning to replace all uses of 'contains' with the
following function:
CREATE OR REPLACE FUNCTION tstzrange_contains(
range tstzrange,
ts timestamptz)
RETURNS bool AS
$$
SELECT (ts >= lower(range) AND (lower_inc(range) OR ts > lower(range)))
AND (ts <= upper(range) AND (upper_inc(range) OR ts < upper(range)))
$$ LANGUAGE SQL IMMUTABLE;
and create btree indexes on lower and upper bound:
CREATE INDEX my_table_time_range_lower_idx ON my_table
(lower(time_range));
CREATE INDEX my_table_time_range_upper_idx ON my_table
(upper(time_range));
Is it the best approach?
--
Best regards,
Marcin Barczynski
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Gerardo Herzig | 2017-09-06 14:13:34 | Re: OS cache management |
| Previous Message | Soni M | 2017-09-06 08:12:26 | OS cache management |