Re: Using the indexing and sampling APIs to realize progressive features

From: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
To: hohenstein(at)cs(dot)uni-kl(dot)de
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Using the indexing and sampling APIs to realize progressive features
Date: 2022-02-03 19:27:44
Message-ID: CAM+6J95Ljza7fsQAHcmh2VgCKbUnw9YVuKxWwQJNZiSuOR1+wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Feb 3, 2022, 8:55 PM <hohenstein(at)cs(dot)uni-kl(dot)de> wrote:

> Hi,
>
>
>
> I have some questions regarding the indexing and sampling API.
>
> My aim is to implement a variant of progressive indexing as seen in this
> paper (link). To summarize,
>
> I want to implement a variant of online aggregation, where an aggregate
> query (Like Sum, Average, etc.) is answered in real time, where the result
> becomes more and more accurate as Tuples are consumed.
>
> I thought that I could maybe use a custom sampling routine to consume
> table samples until I have seen the whole table with no duplicate tuples.
>
>

I am not sure if I understand correctly, but if this is referring to
faceted search, then then the below may be of some help.

https://github.com/citusdata/postgresql-hll
https://github.com/hyperstudio/repertoire-faceting

Performance may vary, but it would help you get an idea of the
implementation.
And you also have rollups and cubes, but they get slow over large tables
and require more resources to speed up.

https://www.cybertec-postgresql.com/en/postgresql-grouping-sets-rollup-cube/

If this is not what you wanted, feel free to ignore.

>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message A Shaposhnikov 2022-02-03 19:32:39 Re: increasing effective_cache_size slows down join queries by a factor of 4000x
Previous Message Matthias Apitz 2022-02-03 19:11:35 Re: sort order for UTF-8 char column with Japanese UTF-8