| From: | Shmagi Kavtaradze <kavtaradze(dot)s(at)gmail(dot)com> | 
|---|---|
| To: | pgsql-novice(at)postgresql(dot)org | 
| Subject: | Optimizing the Query | 
| Date: | 2016-02-16 16:39:34 | 
| Message-ID: | CAHY6mawRXcM91J84Ht4RO4TOoC2RVvDnAtSCcw7MFv5yXZRM3w@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-novice | 
I have a table (doc(varchar), sentenceId(int), bow(varchar[])). In the bow
column I import bag of words. In total there are 4623 rows. Table
(nlptable) looks like(for 10 vectors):
doc     | sentenceId | bow
-------------------------------------------
corpus |      1          | {1,0,0,1,0,0,0,1,0,1}
corpus |      2          | {0,1,1,1,0,1,0,0,0,0}
The query I run (compare bag of words representation of two sentences):
select a.doc, a.sentenceid, b.doc, b.sentenceid,
cardinality(array(select unnest(array_positions(a.bow, '1')) intersect
select unnest(array_positions(b.bow, '1'))))::float /
cardinality(a.bow)::float
from
  nlptable a, nlptable b
where
  a.sentenceid < b.sentenceid;
The problem is that for 10 vectors (10 most common words) the execution
time is about 3 minutes, for 100 vectors about 25 minutes and for 500
vectors 80 minutes. I have to make calculation for 10,000 most common
words, which will possibly take 1 day. The query is too slow and I want to
optimize it, but now idea how. Is there an option to use some boolean type
instead of varchar[] to deacrease size of data?
I am relatively new to postgres, so have no idea about optimization. I also
heard that arrays in postgres are heavy to deal with. Because of this I
searched alternative ways to store bag of words in table, but can not find
functionality other than in arrays.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jack Ort | 2016-02-18 00:23:56 | Slow Query Performance Using ogr_fdw on Windows 2012 R2 with PG 9.5 | 
| Previous Message | Shmagi Kavtaradze | 2016-02-15 16:26:08 | Re: postgres version problem |