Quick Links

Re: optimizing a cpu-heavy query

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Joel Reymont <joelr1(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: optimizing a cpu-heavy query
Date:	2011-04-26 16:00:50
Message-ID:	8038.1303833650@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Joel Reymont <joelr1(at)gmail(dot)com> writes:
> I'm trying to optimize the following query that performs KL Divergence [1]. As you can see the distance function operates on vectors of 150 floats.

> CREATE OR REPLACE FUNCTION docs_within_distance(vec topics, threshold float)
> RETURNS TABLE(id doc_id, distance float) AS $$
> BEGIN
> RETURN QUERY
> SELECT *
> FROM (SELECT doc_id, (SELECT sum(vec[i] * ln(vec[i] / topics[i]))
> FROM generate_subscripts(topics, 1) AS i
> WHERE topics[i] > 0) AS distance
> FROM docs) AS tab
> WHERE tab.distance <= threshold;
> END;
> $$ LANGUAGE plpgsql;

Yikes. That sub-select is a mighty expensive way to compute the scalar
product. Push it into a sub-function that takes the two arrays and
iterates over them with a for-loop. For another couple orders of
magnitude, convert the sub-function to C code. (I don't think you need
a whole data type, just a function that does the scalar product.)

regards, tom lane

In response to

optimizing a cpu-heavy query at 2011-04-26 14:45:09 from Joel Reymont

Responses

Re: optimizing a cpu-heavy query at 2011-04-27 06:59:47 from Joel Reymont

Browse pgsql-general by date

	From	Date	Subject
Next Message	Carlos Sotto Maior (SIM)	2011-04-26 16:24:53	"pg_control version number" after 9.0.4 installation
Previous Message	Scott Marlowe	2011-04-26 15:52:44	Re: Recover database from binary files