Integrating HLL cardinality estimates with join operator estimation

From: Abhishek Kumar <akumar17(at)usc(dot)edu>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Integrating HLL cardinality estimates with join operator estimation
Date: 2024-11-24 00:31:09
Message-ID: CAGXSBMHkzmFPYekSr64FyFLKkv2zyO8m_q8s_9tS5=MY-Mn4tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear PostgreSQL hackers,

I am writing to seek guidance and potential collaboration on a project
involving cardinality estimation improvements in PostgreSQL. The project
aims to enhance join result cardinality estimation by incorporating
HyperLogLog (HLL) estimates alongside the existing join operator framework.

Project Overview:

- Goal: Improve the accuracy of join cardinality estimation using HLL
sketches
- Scope: Modify the existing join estimation logic to consider HLL-based
distinct count estimates
- Expected benefit: More accurate query plans for joins involving
columns with high cardinality

Technical Areas of Interest:

1. Current implementation of join selectivity estimation in
src/backend/optimizer
2. Integration points for HLL sketches within the existing statistics
framework
3. Potential modifications needed to the join operator logic

Questions for the Community:

- Has similar work been attempted or discussed previously?
- What would be the preferred approach to integrate HLL estimates with
the existing join estimation framework?
- Are there specific areas of the codebase I should focus on initially?
- Would this enhancement align with the project's current direction for
query optimization?

I have previously worked with tweaking the BufferReplacement policy for
Postgres wherein I implemented a LazyBufferReplacementPolicy using FIFO
queues, swapping out the clock sweep algorithm, so I have a bit of
familiarity with the Postgres codebase.

I would greatly appreciate any guidance, feedback, or suggestions from the
community.
I'm happy to provide more detailed information about the proposed approach
or clarify any aspects of the project.

Thank you for your time and consideration.

Best regards,
Abhishek Kumar

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2024-11-24 11:45:14 Missing INFO on client_min_messages
Previous Message Andrey M. Borodin 2024-11-23 18:13:02 Re: Forbid to DROP temp tables of other sessions