From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
---|---|
To: | "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Do we want a hashset type? |
Date: | 2023-05-31 15:40:22 |
Message-ID: | 2c047b70-160a-4c9b-b58b-7103fd78d5d4@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, May 31, 2023, at 16:53, Tomas Vondra wrote:
> I think this needs a better explanation - what exactly is a hashset in
> this context? Something like an array with a hash for faster lookup of
> unique elements, or what?
In this context, by "hashset" I am indeed referring to a data structure similar
to an array, where each element would be unique, and lookups would be faster
than arrays for larger number of elements due to hash-based lookups.
This data structure would store identifiers (IDs) of the nodes, not the complete
nodes themselves.
> Presumably it'd store whole adjacent nodes, not just some sort of node
> ID. So what if a node is adjacent to many other nodes? What if a node is
> added/deleted/modified?
That would require updating the hashset, which should be close to O(1) in
practical applications.
> AFAICS the main problem is the lookups of adjacent nodes, generating
> lot of random I/O etc. Presumably it's not that hard to keep the
> "relational" schema with table for vertices/edges, and then an auxiliary
> table with adjacent nodes grouped by node, possibly maintained by a
> couple triggers. A bit like an "aggregated index" except the queries
> would have to use it explicitly.
Yes, auxiliary table would be good, since we don't want to duplicate all
node-related data, and only store the IDs in the adjacent nodes hashset.
/Joel
From | Date | Subject | |
---|---|---|---|
Next Message | Chang Wei 昌維 | 2023-05-31 16:31:45 | Support edit order of the fields in table |
Previous Message | Yugo NAGATA | 2023-05-31 15:14:26 | Incremental View Maintenance, take 2 |