Re: Do we want a hashset type?

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do we want a hashset type?
Date: 2023-05-31 15:40:22
Message-ID: 2c047b70-160a-4c9b-b58b-7103fd78d5d4@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote:
> I think this needs a better explanation - what exactly is a hashset in
> this context? Something like an array with a hash for faster lookup of
> unique elements, or what?

In this context, by "hashset" I am indeed referring to a data structure similar
to an array, where each element would be unique, and lookups would be faster
than arrays for larger number of elements due to hash-based lookups.

This data structure would store identifiers (IDs) of the nodes, not the complete
nodes themselves.

> Presumably it'd store whole adjacent nodes, not just some sort of node
> ID. So what if a node is adjacent to many other nodes? What if a node is
> added/deleted/modified?

That would require updating the hashset, which should be close to O(1) in
practical applications.

> AFAICS the main problem is the lookups of adjacent nodes, generating
> lot of random I/O etc. Presumably it's not that hard to keep the
> "relational" schema with table for vertices/edges, and then an auxiliary
> table with adjacent nodes grouped by node, possibly maintained by a
> couple triggers. A bit like an "aggregated index" except the queries
> would have to use it explicitly.

Yes, auxiliary table would be good, since we don't want to duplicate all
node-related data, and only store the IDs in the adjacent nodes hashset.

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chang Wei 昌維 2023-05-31 16:31:45 Support edit order of the fields in table
Previous Message Yugo NAGATA 2023-05-31 15:14:26 Incremental View Maintenance, take 2