From: | Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
---|---|
To: | Jess Wren <jess(dot)wren(at)interference(dot)cc>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: How to use full-text search URL parser to filter query results by domain name? |
Date: | 2019-04-08 11:50:33 |
Message-ID: | 941c4c08-23e3-ac5c-9e2e-8af76cadd3e3@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 07.04.2019 07:06, Jess Wren wrote:
> However, I can't figure out how I would integrate this into the above
> query to filter out duplicate domains from the results. And because this
> is the docs for "testing and debugging text search
> <https://www.postgresql.org/docs/11/textsearch-debugging.html#TEXTSEARCH-PARSER-TESTING>",
> I don't know if this use of `ts_parse()` is even related to how the URL
> parser is intended to be used in practice.
>
> How would I use the "host" parser in my query above to return one row
> per domain? Also, how would I appropriately index the "links" table for
> "host" and "url" token lookup?
I think it is normal to use ts_parse(). And I suppose you might use
windows functions.
For example, you have table links:
=# create table links (score int, link text);
=# insert into links values
(1, 'http://www.foo.com/bar'),
(2, 'http://www.foo.com/foo'),
(2, 'http://www.bar.com/foo'),
(1, 'http://www.bar.com/bar');
You can use the following query:
=# with l as (
select score, token, link,
rank() over (partition by token order by score) as rank
from links,
lateral ts_parse('default', link)
where tokid = 6)
select score, token, link from l where rank = 1;
score | token | link
-------+-------------+------------------------
1 | www.bar.com | http://www.bar.com/bar
1 | www.foo.com | http://www.foo.com/bar
It is just the idea, probably the query might be simpler.
--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Raghavendra Rao J S V | 2019-04-08 13:21:17 | Getting error while running the pg_basebackup through PGBOUNCER |
Previous Message | mariusz | 2019-04-08 11:06:44 | Re: SQl help to build a result with custom aliased bool column |