Re: Website mailing list search enhancement idea - one result per thread

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL WWW <pgsql-www(at)lists(dot)postgresql(dot)org>
Subject: Re: Website mailing list search enhancement idea - one result per thread
Date: 2021-02-06 09:05:51
Message-ID: 55df24092b8c1ffa44f382b6f3a681df@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-www

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> When searching the mailing lists in advanced search mode we have list
> filtering, timeframe filtering (pretty coarse though) and then sorting
> mode. For moderately long running threads a search term is going to
> return
> many messages from the same thread. It would desirable to ask that
> only
> threads be shown and maybe how many individual messages in that thread
> matched.

+1. I always think so.

On 2020-05-01 01:55, Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Thu, Apr 30, 2020 at 2:17 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> "David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
>>>> It would desirable to ask that only
>>>> threads be shown and maybe how many individual messages in that
>>>> thread
>>>> matched.
>
>>> +1, an option to do that would be handy. No idea how hard it is ...
>
>> Not having actually looked at it, probably not too hard. The hardest
>> part
>> might be bikeshedding about the color (aka user interface).
>
>> Should this replace the current interface, or do we want to make
>> something
>> like this a choice?
>
> No no, not replace. It should be an option, "aggregate into threads"
> or something like that.

Why don't you implement the following feature?

(1) add an option on the advanced search page.

I agree with the idea to add the option to aggregate into threads
on the advanced search page.

I attached the image("add_option_to_aggregate_into_threads.png").
What do you think?

(2) change the message's URLs on the result pages if the option is
checked.

I think it's better to change the message's URLs to flatted ones on
result pages
if the messages are aggregated into threads because users want to access
the
discussion pages not per messages but per threads directly.

```
--- a/templates/search/listsearch.html
+++ b/templates/search/listsearch.html
@@ -53,7 +59,12 @@
<h2>Results {{firsthit}}-{{lasthit}} of {%if hitcount == 1000%}more
than 1000{%else%}{{hitcount}}{%endif%}.</h2>
{%if pagelinks %}Result pages:
{{pagelinks|safe}}<br/><br/>{%endif%}
{%for hit in hits %}
- {{forloop.counter0|add:firsthit}}. <a
href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a>
[{{hit.rank|floatformat:2}}]<br/>
+ {{forloop.counter0|add:firsthit}}.
+ {%if aggregate%}
+ <a
href="https://www.postgresql.org/message-id/flat/{{hit.messageid}}">{{hit.subject}}</a>
[{{hit.rank|floatformat:2}}]<br/>
+ {%else%}
+ <a
href="https://www.postgresql.org/message-id/{{hit.messageid}}">{{hit.subject}}</a>
[{{hit.rank|floatformat:2}}]<br/>
+ {%endif%}
From {{hit.author}} on {{hit.date}}.<br/>
{{hit.abstract|safe}}<br/>
<a
href="https://www.postgresql.org/message-id/{{hit.messageid}}">https://www.postgresql.org/message-id/{{hit.messageid}}</a><br/>
```

If someone already works for this, it's ok to ignore the following.
Although I'm not familiar with Django, I made the patches for pgarchives
and pgweb.

- pgweb: "0001-wip-add-an-search-option-to-aggregate-into-threads.patch"

This patch changes the user interface the above (1) and (2).

- pgarchives:
"0001-wip-add-an-search-parameter-to-aggregate-into-thread.patch"

This patch changes the search query to the Postgresql. The example query
is the following.
To aggregate into threads, it uses the window function to extract the
highest-ranked
messages in the same thread group.

```
SELECT messageid, date, subject, _from, ts_rank_cd(fti,
plainto_tsquery('public.pg', 'wal writer')),
ts_headline(bodytxt, plainto_tsquery('public.pg', 'wal
writer'),'StartSel="[[[[[[",StopSel="]]]]]]"')
FROM
(SELECT messageid, date, subject, _from, fti, bodytxt,
-- add a sub query
RANK() OVER (PARTITION BY threadid ORDER BY ts_rank_cd(fti,
plainto_tsquery('wal writer')) DESC, id) AS rank -- this line is main
change
FROM messages m
WHERE
fti @@ plainto_tsquery('public.pg', 'wal writer')
AND EXISTS (SELECT 1 FROM list_threads lt WHERE
lt.threadid=m.threadid AND lt.listid=ANY(ARRAY[2]))
AND m.date > '2020-02-07T08:44:56.738979'::timestamp
) m1
WHERE m1.rank = 1
ORDER BY ts_rank_cd(fti, plainto_tsquery('wal writer')) DESC
LIMIT 1000;
```

If you have any good ideas, please let me know.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachment Content-Type Size
add_option_to_aggregate_into_threads.png image/png 94.2 KB
0001-wip-add-an-search-option-to-aggregate-into-threads.patch text/x-diff 4.1 KB
0001-wip-add-an-search-parameter-to-aggregate-into-thread.patch text/x-diff 2.1 KB

In response to

Responses

Browse pgsql-www by date

  From Date Subject
Next Message Jonathan S. Katz 2021-02-06 15:31:54 Re: Documentation building broken in CFBot
Previous Message Sad Man 2021-02-04 05:46:33 Wiki editor request