From: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Amit Langote <amitlangote09(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>, Ajay Pal <ajay(dot)pal(dot)k(at)gmail(dot)com>, Imran Zaheer <imran(dot)zhir(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SQL Property Graph Queries (SQL/PGQ) |
Date: | 2025-04-07 08:13:35 |
Message-ID: | CAEG8a3JrO--th6N6oQebqAvZZBHShOJ5Z-fZSS8uNJZs-tQ_tQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Ashutosh,
On Mon, Apr 7, 2025 at 1:19 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Sat, Apr 5, 2025 at 6:20 PM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
> >
> > Hi Ashutosh and Peter,
> >
> > Since this PGQ feature won't be in PG 18, I'd like to raise a discussion of
> > the possibility of implementing the quantifier feature, which I think is a
> > quite useful feature in the graph database area.
>
> I agree that quantifiers feature is very useful; it's being used in
> many usecases. However, it's a bit of a complex feature. IMO, we
> should keep that discussion as well as the patch in a separate thread,
> so that this patchset doesn't grow too large to review and also
> discussion in this thread can remain focused. Once we get the current
> patch set reviewed and committed we can tackle the quantifier problem
> in a separate discussion. Of course that doesn't mean that we can not
> start discussion, try POC and even a working patch for quantifier
> support.
>
> Peter may think otherwise.
>
> >
> > I'll start with a graph definition first.
> >
> > `Person(id, name, age, sex)` with id as PK
> > `Knows(id, start_id, end_id, since)` with id as PK, start_id and
> > end_id FK referencing Person's id
> >
> > insert into Person values(1, 'A', 31, 'M'), (2, 'B', 30, 'F'), (3,
> > 'C', 33, 'M'), (4, 'D', 31, 'F'), (5, 'E', 32, 'M'), (6, 'F', 33,
> > 'M');
> > insert into Knows values (1, 1, 2, '2020'); -- A knows B since 2020
> > insert into Knows values (2, 1, 3, '2021'); -- A knows C since 2021
> > insert into Knows values (3, 1, 4, '2020'); -- A knows D since 2020
> > insert into Knows values (4, 2, 4, '2023'); -- B knows D since 2023
> > insert into Knows values (5, 3, 5, '2022'); -- C knows E since 2022
> > insert into Knows values (6, 2, 6, '2021'); -- B knows F since 2021
> > insert into Knows values (7, 4, 6, '2020'); -- D knows F since 2020
> >
> > Then we create a property graph:
> >
> > CREATE property graph new_graph
> > VERTEX TABLES (Person)
> > EDGE TABLES (Knows);
> >
> > If we want to find A's non-directly known friends within 3 hops, we can query:
> >
> > select name from graph_table (new_graph match (a:Person WHERE a.name =
> > 'A') --> (b:Person) --> (c:Person) COLUMNS (c.name))
> > union
> > select name from graph_table (new_graph match (a:Person WHERE a.name =
> > 'A') --> (b:Person) -->(c:Person)-->(d:Person) COLUMNS (d.name));
> >
> > Or if we support quantifier, we can simply the query as:
> >
> > select name from graph_table (new_graph match (a:Person WHERE a.name =
> > 'A') -->{2,3} (b:Person) COLUMNS (b.name));
> >
> > In the current design of PostgreSQL, we can rewrite this pattern with
> > quantifiers to
> > the union form with some effort.
> >
> > But what if the pattern is more complicated, for example:
> >
> > 1. select name, since from graph_table (new_graph match (a:Person
> > WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
> > r.since));
> > Can we support the r.since column? I guess not, in this case r is a
> > variable length edge.
> >
> > 2. select name, count from graph_table (new_graph match (a:Person
> > WHERE a.name = 'A') -[r:Knows]->{2,3} (b:Person) COLUMNS (b.name,
> > count(r)));
> > Can we support this count aggregation(this is called horizontal
> > aggregation in Oracle's pgql)? How can the executor know the length of
> > the variable length edge?
> >
> > 3. What if the query doesn't specify the Label of edge, and there can
> > be different edge labels of r, can we easily do the rewrite?
> >
> > I did some study of the apache age, they have fixed columns for node
> > labels(id, agtype)
> > and edge labels(id, source_id, end_id, agtype), agtype is kind of
> > json. So they can
> > resolve the above question easily.
> >
> > Above are just my random thoughts of the quantifier feature, I don't have a copy
> > of the PGQ standard, so I'd like to hear your opinion about this.
> >
>
> I think the questions you have raised are valid. If we decide to
> discuss this in a separate thread, I will start that thread just by
> responding to these questions and design I have in mind.
I'm ok with starting a new thread for quantifier discussion, and I'd
really happy to know your design on this.
>
> --
> Best Wishes,
> Ashutosh Bapat
--
Regards
Junwang Zhao
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2025-04-07 08:16:40 | Re: History doc page clarification on naming |
Previous Message | Jakub Wartak | 2025-04-07 08:09:26 | Re: Draft for basic NUMA observability |