From: | Jacob Brazeal <jacob(dot)brazeal(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> |
Subject: | Re: Experimental tool to explore commitfest patches |
Date: | 2025-02-26 07:59:15 |
Message-ID: | CA+COZaDtJ-fa0Lu1zDW7W8op+k+y-77rhABqLz8U0MVAJ9w70g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wanted to provide a quick update on the app [0]. Here are the main issues
I've seen flagged so far:
1. The ranking system needs improvement. Ideally it should promote *relevant,
important, ready-for-review* patches.
2. We should display contributor names as they appear in the commitfest app
(this is relevant because we have to correlate names from several different
systems.)
I will be working on all of these, but tonight I want to provide an update
on the ranking system. The app now predicts which committers might be a
good fit for a patch, and displays this information in the app. If you are
committer and select your name in the queue, those patches will float to
the top. As a quick sanity check, most of the cases I've seen flagged so
far are correctly handled by the new system. Here are some more details on
how it works and how
The new recommendation system is based on keywords. I used an LLM to
extract technical keywords from the mailing list threads associated with
the last 10,000 git commits, and then trained a logistic regression model
to match the keywords to committers. I'm no expert at this, but I did some
basic statistical validation of the result on a training/test split and got
decent results: around 44% of the top choices of the model were correct,
and just to be safe, I show the top 3 predicted committer for each patch in
the UX. When looking at specific folks like Robert, in our test dataset,
about 77% of the results matched to him he actually committed (precision)
and we overall identify about 45% of his commits (recall.) So, not perfect,
but actually pretty likely to tag a mailing list thread to the person who
will commit it.
In the UX, if you are one of the top 3 identified committers, you will also
see a list of the top keywords from the mailing thread that were associated
with you.
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2025-02-26 08:25:49 | Re: Spinlock can be released twice in procsignal.c |
Previous Message | Michael Paquier | 2025-02-26 07:52:13 | Re: per backend WAL statistics |