From: | Stefan Keller <sfkeller(at)gmail(dot)com> |
---|---|
To: | Hadi Moshayedi <hadi(at)citusdata(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL Columnar Store for Analytic Workloads |
Date: | 2014-04-08 06:28:09 |
Message-ID: | CAFcOn2_CUt8hkDyEH2tk=wY4EAP1ZiKkauySdYmitykR=VmiTg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hadi
Do you think that cstore_fd*w* is also welll suited for storing and
retrieving linked data (RDF)?
-S.
2014-04-03 18:43 GMT+02:00 Hadi Moshayedi <hadi(at)citusdata(dot)com>:
> Dear Hackers,
>
> We at Citus Data have been developing a columnar store extension for
> PostgreSQL. Today we are excited to open source it under the Apache v2.0
> license.
>
> This columnar store extension uses the Optimized Row Columnar (ORC) format
> for its data layout, which improves upon the RCFile format developed at
> Facebook, and brings the following benefits:
>
> * Compression: Reduces in-memory and on-disk data size by 2-4x. Can be
> extended to support different codecs. We used the functions in
> pg_lzcompress.h for compression and decompression.
> * Column projections: Only reads column data relevant to the query.
> Improves performance for I/O bound queries.
> * Skip indexes: Stores min/max statistics for row groups, and uses them to
> skip over unrelated rows.
>
> We used the PostgreSQL FDW APIs to make this work. The extension doesn't
> implement the writable FDW API, but it uses the process utility hook to
> enable COPY command for the columnar tables.
>
> This extension uses PostgreSQL's internal data type representation to
> store data in the table, so this columnar store should support all data
> types that PostgreSQL supports.
>
> We tried the extension on TPC-H benchmark with 4GB scale factor on a
> m1.xlarge Amazon EC2 instance, and the query performance improved by 2x-3x
> compared to regular PostgreSQL table. Note that we flushed the page cache
> before each test to see the impact on disk I/O.
>
> When data is cached in memory, the performance of cstore_fdw tables were
> close to the performance of regular PostgreSQL tables.
>
> For more information, please visit:
> * our blog post:
> http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics
> * our github page: https://github.com/citusdata/cstore_fdw
>
> Feedback from you is really appreciated.
>
> Thanks,
> -- Hadi
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ian Barwick | 2014-04-08 06:39:25 | Doc typo in "9.28. Event Trigger Functions" |
Previous Message | Tom Lane | 2014-04-08 04:52:14 | Re: GiST support for inet datatypes |