<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
I am trying to create a Sphinx index on a fairly large Postgres table.
My <br>
problem is the fact that the Postgres API is trying to put the entire <br>
result set into the memory:<br>
<tt><font color="#3333ff"><br>
</font></tt>
<blockquote><tt><font color="#3333ff">[root(at)medo etc]# ../bin/indexer
--all</font></tt><br>
<tt><font color="#3333ff">Sphinx 0.9.9-release (r2117)</font></tt><br>
<tt><font color="#3333ff">Copyright (c) 2001-2009, Andrew Aksyonoff</font></tt><br>
<br>
<tt><font color="#3333ff">using config file
'/usr/local/etc/sphinx.conf'...</font></tt><br>
<tt><font color="#3333ff">indexing index 'test1'...</font></tt><br>
<u><b><tt><font color="#3333ff"><font color="#ff0000">E</font><font
color="#ff0000">RROR: index 'test1': sql_query: out of memory for
query result</font></font></tt></b></u><br>
<tt><font color="#3333ff"> (DSN=pgsql://<a class="moz-txt-link-freetext" href="news:***(at)medo:5432/news">news:***(at)medo:5432/news</a>).</font></tt><br>
<tt><font color="#3333ff">total 0 docs, 0 bytes</font></tt><br>
<tt><font color="#3333ff">total 712.593 sec, 0 bytes/sec, 0.00
docs/sec</font></tt><br>
<tt><font color="#3333ff">total 0 reads, 0.000 sec, 0.0 kb/call avg,
0.0 msec/call avg</font></tt><br>
<tt><font color="#3333ff">total 0 writes, 0.000 sec, 0.0 kb/call avg,
0.0 msec/call avg</font></tt><br>
</blockquote>
<br>
Corresponding log entries on the Postgres side are:<br>
<br>
<blockquote><tt><font color="#3333ff">STATEMENT: SELECT
segment_id,air_date,start_time,end_time,source_type, </font></tt><br>
<tt><font color="#3333ff">market_name,station_name,program_name,
content_text FROM news_segments</font></tt><br>
<tt><font color="#3333ff">LOG: unexpected EOF on client connection</font></tt><br>
<tt><font color="#3333ff">LOG: unexpected EOF on client connection</font></tt><br>
<tt><font color="#3333ff">LOG: unexpected EOF on client connection</font></tt><br>
</blockquote>
<br>
The Postgres message isn't exactly helpful, but given the
circumstances, it can't be more helpful. The problem is on the client
side. The table I am using is pretty large and has 14.3 million rows:<br>
<br>
<blockquote><tt><font color="#3333ff">news=> select count(*) from
news_segments;</font></tt><br>
<tt><font color="#3333ff"> count </font></tt><br>
<tt><font color="#3333ff">----------</font></tt><br>
<tt><font color="#3333ff"> 14366286</font></tt><br>
<tt><font color="#3333ff">(1 row)</font></tt><br>
</blockquote>
<tt><font color="#3333ff"></font></tt>
<blockquote><tt><font color="#3333ff">Time: 233759.639 ms</font></tt><br>
</blockquote>
<br>
Is there anything I can do to prevent the API from attempting to put
the <br>
entire query result in memory? I can partition the table, create <br>
separate indexes and merge them, but that is a large unnecessary <br>
maintenance. I also suspect that the other queries with a large result <br>
set will start to fail.<br>
<br>
I temporarily solved my problem by using "range query" option offered
by <br>
sphinx:<br>
<tt><font color="#3333ff"><br>
</font></tt>
<blockquote><tt><font color="#3333ff">sql_query_range = \</font></tt><br>
<tt><font color="#3333ff"> SELECT
min(segment_id),max(segment_id) FROM news_segments</font></tt><br>
<tt><font color="#3333ff"> sql_range_step=10000</font></tt><br>
<tt><font color="#3333ff">
sql_query = \</font></tt><br>
<tt><font color="#3333ff"> SELECT </font></tt><br>
<tt><font color="#3333ff">segment_id,air_date,start_time,end_time,source_type,
\</font></tt><br>
<tt><font color="#3333ff">
market_name,station_name,program_name, </font></tt><br>
<tt><font color="#3333ff">segment_text \</font></tt><br>
<tt><font color="#3333ff"> FROM news_segments \</font></tt><br>
<tt><font color="#3333ff"> WHERE
segment_id>=$start and segment_id<$end</font></tt><br>
</blockquote>
<br>
Segment_id is a numeric field and the query will be executed many
times, <br>
which is less than optimal. It does make the thing work, though. Would
it <br>
be possible to set maximum memory for the query result caching from the
<br>
API itself? How can I increase the maximum memory size used by the
client API?<br>
<br>
<pre class="moz-signature" cols="72">--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
<a class="moz-txt-link-abbreviated" href="http://www.vmsinfo.com">www.vmsinfo.com</a>
</pre>
</body>
</html>