Re: count of occurences PLUS optimisation

From: Haller Christoph <ch(at)rodos(dot)fzk(dot)de>
To: trmcdougle(at)my-deja(dot)com (Thurstan R(dot) McDougle)
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: count of occurences PLUS optimisation
Date: 2001-09-13 14:44:54
Message-ID: 200109131244.OAA19247@rodos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Adam,
try this
select distinct job_num, count (job_num) from search_record
group by job_num ;
don't worry about an ordered list - group by does it for you.
Regards, Christoph
>
> HACKERS: see the end of this message about a possible optimisation for
> ORDER BY+LIMIT cases (the normal use of LIMIT?)
>
> Adam wrote:
> >
> > I help run a job database and have a table of search records. I want
> > a query that will return the top 10 jobs by search frequency. I'm
> > familiar with ORDER BY and LIMIT, so I basically need this:
> >
> > Given a table search_records:
> > job_num
> > -------
> > 1
> > 2
> > 2
> > 3
> > 4
> > 4
> > 4
> >
> > I want a query that will return:
> > job_num | count
> > --------+------
> > 1 |1
> > 2 |2
> > 3 |1
> > 4 |3
> >
> > I tried
> >
> > select distinct job_num, (select count(*) from search_records j where
> > j.job_num=k.job_num) from search_records k
> >
> > but it is horribly slow (it takes several minutes on a table of about
> > 25k rows!). I assume it scans the entire table for every job_num in
> > order to count the number of occurences of that job_num, taking order
> > n^2 time. Since I can easily use job_num as an index (being integers
> > from 0 to roughly 400 so far) I could just do a "select * from
> > search_records" and do the counting in PHP (our HTML pre-processor) in
> > order n time. However, I don't know how to do an order n*log(n) sort
> > in PHP, just n^2, so there would still be an efficiency problem.
> > I have Postgresql 7.0.3.
> > Help is of course greatly appreciated.
>
> I have not tried it but how about:-
>
> select job_num from
> (select job_num, count(*) as c from search_records group by job_num)
> order by c limit 10;
>
> I am not sure if count(*) would work in this context, if not try count()
> on some field that is in every record.
>
>
> If you can be sure that the top 10 will have at least a certain
> threshold of searches (perhaps >1!) then it MIGHT be faster, due to less
> data being sorted for the outer selects order by, (experiment) to do:-
>
> select job_num from
> (select job_num, count(*) as c from search_records group by job_num
> HAVING c>1)
> order by c limit 10;
>
> It would depend on how efficient the ORDER BY and LIMIT work together.
> (The ORDER BY could build a list of LIMIT n items and just replace items
> in that list...a lot more efficient both of memory and comparisons than
> building the full list and then keeping the top n)
>
> HACKERS: If it does not do this it might be a usefull optimisation.
> There would probably need to be a cutoff limit on whether to apply this
> method or sort and keep n. Also for LIMIT plus OFFSET it would need to
> build a list of the the total of the LIMIT and OFFSET figures.
>
> --
> This is the identity that I use for NewsGroups. Email to
> this will just sit there. If you wish to email me replace
> the domain with knightpiesold . co . uk (no spaces).
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Robert Sell 2001-09-13 15:07:38 Re: Printable report generation
Previous Message Leandro Rodrigo Saad Cruz 2001-09-13 14:39:27 Bad date external representation Was :[problems using pg_dump and datestyle format]

Browse pgsql-hackers by date

  From Date Subject
Next Message Ryan Mahoney 2001-09-13 15:26:46 Re: ERROR: Cannot insert a duplicate key into a
Previous Message Martijn van Oosterhout 2001-09-13 14:02:31 Re: count of occurences PLUS optimisation