Re: Zedstore - compressed in-core columnar storage

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-04-24 19:19:20
Message-ID: CALfoeitVUvPHJEjNacxSXmaH4OL+vmjmRGuToMSn=p_UGjHWaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 16, 2019 at 9:15 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
> I'm not sure it's that clear cut, actually. Sure, it's not the usual
> (block,item) pair so it's not possible to jump to the exact location, so
> it's not the raw physical identifier as regular TID. But the data are
> organized in a btree, with the TID as a key, so it does actually provide
> some information about the location.
>

From representation perspective its logical identifier. But yes
since
is used as used as key to layout datum's, there exists pretty
good
correlation between TIDs and physical location. Can consider it
as
clustered based on TID.

I've asked about BRIN indexes elsewhere in this thread, which I think is
> related to this question, because that index type relies on TID providing
> sufficient information about location. And I think BRIN indexes are going
> to be rather important for colstores (and formats like ORC have something
> very similar built-in).
>
> But maybe all we'll have to do is define the ranges differently - instead
> of "number of pages" we may define them as "number of rows" and it might
> be working.
>

BRIN indexes work for zedstore right now. A block range maps
to
just a range of TIDs in zedstore, as pointed out above. When one converts
a
zstid to an ItemPointer, can get the "block number" from
the

ItemPointer, like from a normal heap TID. It doesn't mean the
direct
physical location of the row in zedstore, but that's
fine.

It might be sub-optimal in some cases. For example if one
zedstore

page contains TIDs 1-1000, and another 1000-2000, and the entry in
the
BRIN index covers TIDs 500-1500, have to access both
zedstore

pages. Would be better if the cutoff points in the BRIN index
would
match the physical pages of the zedstore. But it still works, and
is
probably fine in
practice.

Plan is to add integrated BRIN index in zedstore, means keep
min-max
values for appropriate columns within page. This will not help
to
eliminate the IO as external BRIN index does but helps to
skip

uncompression and visibility checks etc... for blocks not matching
the
conditions.

Just to showcase brin works for zedstore, played with hands-on example
mentioned in
[1].

With btree index on zedstore

QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=4351.50..4351.51 rows=1 width=32) (actual
time=1267.140..1267.140 rows=1
loops=1)

-> Index Scan using idx_ztemperature_log_log_timestamp on
ztemperature_log (cost=0.56..4122.28 rows=91686 width=4) (actual
time=0.117..1244.112 rows=86400
loops=1)

Index Cond: ((log_timestamp >= '2016-04-04 00:00:00'::timestamp
without time zone) AND (log_timestamp < '2016-04-05 00:00:00'::timestamp
without time
zone))

Planning Time: 0.240
ms

Execution Time: 1269.016
ms

(5
rows)

With brin index on zedstore.
Note: Bitmap index for zedstore currently scans all the columns.
Scanning only required columns for query is yet to be implemented.

QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate (cost=217538.85..217538.86 rows=1 width=32) (actual
time=54.167..54.167 rows=1
loops=1)

-> Gather (cost=217538.63..217538.84 rows=2 width=32) (actual
time=53.967..55.184 rows=3
loops=1)

Workers Planned:
2

Workers Launched:
2

-> Partial Aggregate (cost=216538.63..216538.64 rows=1 width=32)
(actual time=42.956..42.957 rows=1
loops=3)

-> Parallel Bitmap Heap Scan on ztemperature_log
(cost=59.19..216446.98 rows=36660 width=4) (actual time=3.571..35.904
rows=28800
loops=3)

Recheck Cond: ((log_timestamp >= '2016-04-04
00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05
00:00:00'::timestamp without time
zone))

Rows Removed by Index Recheck:
3968

Heap Blocks:
lossy=381

-> Bitmap Index Scan on
idx_ztemperature_log_log_timestamp (cost=0.00..37.19 rows=98270 width=0)
(actual time=1.201..1.201 rows=7680
loops=1)

Index Cond: ((log_timestamp >= '2016-04-04
00:00:00'::timestamp without time zone) AND (log_timestamp < '2016-04-05
00:00:00'::timestamp without time
zone))

Planning Time: 0.240
ms

Execution Time: 55.341
ms

(13
rows)

schema_name | index_name | index_ratio |
index_size |
table_size

-------------+------------------------------------+-------------+------------+------------

public | idx_ztemperature_log_log_timestamp | 0 | 80
kB | 1235
MB

(1
row)

1]
https://www.postgresql.fastware.com/blog/brin-indexes-what-are-they-and-how-do-you-use-them

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-24 19:23:15 Re: Regression test coverage of GiST index build is awful
Previous Message Tom Lane 2019-04-24 19:17:06 Re: Calling PrepareTempTablespaces in BufFileCreateTemp