Quick Links

Re: Query on indexed table too slow

From:	Shmagi Kavtaradze <kavtaradze(dot)s(at)gmail(dot)com>
To:	Fábio Moreira <fabio(at)dias(dot)moreira(dot)nom(dot)br>
Cc:	pgsql-novice novice <pgsql-novice(at)postgresql(dot)org>
Subject:	Re: Query on indexed table too slow
Date:	2016-03-09 11:07:27
Message-ID:	CAHY6mazDUs-jUC0zoMVOfVkCe-BhPv2k8sZwZavgcznp5cjwGA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

Actually I have dataset at least with 4623 sentences and each bag of words
representation will consists of 10,000 characters. So 250 chunks of 40
character long strings. I tried to create table instead of displaying
results. It takes less for small dataset to create table instead of just
selecting, but with increasing up to 250 chunks, the time of displaying and
just creating the table is almost the same.

On Tue, Mar 8, 2016 at 1:20 AM, Fábio Moreira <fabio(at)dias(dot)moreira(dot)nom(dot)br>
wrote:

> Hi Shmagi,
>
> Your table is tiny -- tiny enough that the actual working set (the rows of
> chunks2, without the "doc" field) easily fits in memory; you're unlikely to
> get a large improvement here, specially considering how, by the looks of
> the query planner, you have relatively few distinct chunkid (is that so?
> can you post SELECT COUNT(DISTINCT chunkid) FROM chunks2?).
>
> Two things come to mind here:
>
> 1) that similarity comparison you're doing looks expensive, with a cast to
> TEXT and string manipulation and what not. I'm not sure if PostgreSQL has a
> native bit count method to speed this part of the code up; you could
> implement one in C to see if that improves things, but it might be easier
> (if indeed you only have 10 bits) to do instead:
>
> CREATE TEMPORARY TABLE popcount AS
> SELECT
> i::BIT(10) AS value,
> Length(Replace(Cast(i::BIT(10) AS TEXT), '0', '')) AS popcount
> FROM
> generate_series(0, 1023, 1) AS temp(i);
>
> and join against that table (again, it's tiny, so no real need for an
> index here).
>
> 2) do you really need all 100M rows of the output? I assume this is some
> document similarity metric you're after? What will be done with this data
> after it is generated?
>
> []s, Fábio.
>
> On Tue, Mar 8, 2016 at 2:12 AM, Shmagi Kavtaradze <kavtaradze(dot)s(at)gmail(dot)com>
> wrote:
>
>> I have a table with 46230 rows(1 doc, 4623 sentences and 10 chunks for
>> each sentence):
>>
>> create table chunks(
>> doc varchar,
>> sentenceid int,
>> chunkid int,
>> chunk bit(10)
>> );
>>
>> With the query I want to compare sentence chunks with other sentence
>> chunks that have same chunkid:
>>
>> SELECT
>> a.sentenceid,
>> b.sentenceid, a.chunkid,
>> Length(Replace(Cast(a.chunk & b.chunk AS TEXT), '0', ''))::float /
>> Length(a.chunk)::float
>> FROM chunks2 a
>> INNER JOIN chunks2 b
>> ON a.sentenceid < b.sentenceid and a.chunkid = b.chunkid;
>>
>> I ran explain analyze on unindexed table, composite index and both
>> indexed separately,but time is the same for all:
>>
>> Indexed on (sentenceid, chunkid):
>>
>> Hash Join (cost=1335.17..4549476.28 rows=71249559 width=26) (actual
>> time=144.376..1156178.110 rows=106837530 loops=1)
>> Hash Cond: (a.chunkid = b.chunkid)
>> Join Filter: (a.sentenceid < b.sentenceid)
>> Rows Removed by Join Filter: 106883760
>> -> Seq Scan on chunks2 a (cost=0.00..757.30 rows=46230 width=15)
>> (actual time=0.039..77.275 rows=46230 loops=1)
>> -> Hash (cost=757.30..757.30 rows=46230 width=15) (actual
>> time=142.954..142.954 rows=46230 loops=1)
>> Buckets: 65536 Batches: 1 Memory Usage: 2680kB
>> -> Seq Scan on chunks2 b (cost=0.00..757.30 rows=46230
>> width=15) (actual time=0.031..64.340 rows=46230 loops=1)
>> Planning time: 1.209 ms
>> Execution time: 1212779.012 ms
>>
>>
>> I know they have the same operations and no index was used. Where is my
>> mistake and how to speed up query with index? Or how to use indexes
>> efficiently in my case?
>>
>>
>
>
> --
> Fábio Dias Moreira
>

In response to

Re: Query on indexed table too slow at 2016-03-08 00:20:22 from Fábio Moreira

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Sherrie Kubis	2016-03-09 17:58:31	Getting started - pgadmin3
Previous Message	Jozef Riha	2016-03-08 22:42:25	Re: help needed with error during upgrade (9.4 -> 9.5)