From: | "Tena Sakai" <tsakai(at)gallo(dot)ucsf(dot)edu> |
---|---|
To: | "Emanuel Calvo Franco" <postgres(dot)arg(at)gmail(dot)com>, <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: very, very slow performance |
Date: | 2009-02-20 19:59:41 |
Message-ID: | FE44E0D7EAD2ED4BB2165071DB8E328C0378F76C@egcrc-ex01.egcrc.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi Emanuel,
> Do you tried partitioned tables? diferent tablespaces?
> set the storage external for more important columns?
> what kind of indexes do you have?
I think the table design is ok.
I don't know what "set the storage external" means.
Maybe you can explain it to me in lay-person's terms?
I think indices are ok, but I welcome your comments.
canon=# \d genotype
Table "public.genotype"
Column | Type | Modifiers
-----------------+-----------------------------+---------------------------------------------------------------
genotypeid | integer | not null default nextval('genotype_genotypeid_seq'::regclass)
subjectid | integer |
markerid | integer |
allele1id | integer |
allele2id | integer |
datecreated | timestamp without time zone | not null
datereplaced | timestamp without time zone | not null
ignore | character(1) | not null default 'N'::bpchar
inconsistent | character(1) | not null default 'N'::bpchar
sourcetablename | character varying | not null
sourceid | character varying | not null
Indexes:
"genotype_pkey" PRIMARY KEY, btree (genotypeid, datecreated)
"genotype_genotypeid_idx" btree (genotypeid)
"genotype_markerid_idx" btree (markerid)
"genotype_source_idx" btree (sourceid, sourcetablename)
"genotype_subjectid_idx" btree (subjectid)
canon=# \d allele
Table "public.allele"
Column | Type | Modifiers
--------------+-----------------------------+-----------------------------------------------------------
alleleid | integer | not null default nextval('allele_alleleid_seq'::regclass)
markerid | integer |
value | character varying | not null
datecreated | timestamp without time zone | not null
datereplaced | timestamp without time zone | not null
Indexes:
"allele_pkey" PRIMARY KEY, btree (alleleid, datecreated)
"allele_markerid_idx" btree (markerid)
> try to run explain analyze for those querys, then
> post the results.
Yes, it is running for last 13+ hours and I have no
idea how much longer it might take.
> I recommend follow this thread:
> http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php
That's pretty interesting stuff. I need to experiment tweaking
parameters Joshua mentions. Alas, now is not a good time to do
so.
Regards,
Tena Sakai
-----Original Message-----
From: pgsql-admin-owner(at)postgresql(dot)org on behalf of Emanuel Calvo Franco
Sent: Fri 2/20/2009 4:10 AM
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: [ADMIN] very, very slow performance
2009/2/20 Tena Sakai <tsakai(at)gallo(dot)ucsf(dot)edu>:
> Hi Everybody,
>
> Here's the query (please read them in fixed-size font, if
> you can):
>
> select subjectid, genotype.markerid, a1.value as allele1,
> a2.value as allele2, genotype.dateCreated,
> genotype.dateReplaced, genotype.ignore,
> genotype.inconsistent
> from genotype, allele a1, allele a2
> where
> allele1id = a1.alleleid
> and
> allele2id = a2.alleleid;
>
> Genotype table mentioned above has about 600,000,000+ rows. As
> I mentioned, there are 20 more of them running concurrently.
> 3 other jobs look like:
>
Do you tried partitioned tables? diferent tablespaces? set the storage
external for
more important columns? what kind of indexes do you have?
> SELECT a.markerid,a.type,a.localname,b.ncbibuild,
> a.chromosome,a.geneticposition,b.physicalposition,
> a.strand,a.stdflanks,a.maxflanks,a.datecreated,
> a.datereplaced,a.sourcetablename,a.sourceid,
> b.dbsnprsid,a.ignore,a.gene
> FROM public.marker a, public.snpposition b
> WHERE
> a.ignore= 'N'
> AND a.datecreated <= (timestamp'Wed Oct 29 09:35:54.266 2008')
> AND a.datereplaced > (timestamp'Wed Oct 29 09:35:54.266 2008')
> AND a.localname IN
> ('RS10757474','RS7859598','RS6148','RS9792663','RS1541125',
> 'RS10511446','RS10814410','RS12338622','RS875587',
> 'RS1590979', 'RS748786','RS958505','RS12352961',
> and on and on and on...);
>
>
> insert into summarystats
> select 'Marker by Chromosomes', chromosome,
> sourcetablename,
> count(*), null, to_timestamp('2009-02-18
> 20:29:40.125',
> 'yyyy-mm-dd hh:mi:ss.ms')
> from marker
> where ignore = 'N'
> and datereplaced = '3000-01-01 12:00:00.000'
> and exists (select 1
> from genotype
> where genotype.markerid =
> marker.markerid
> and genotype.ignore = 'N'
> and genotype.datereplaced =
> '3000-01-01 12:00:00.000')
> group by chromosome, sourcetablename;
>
> COPY public.genotype (genotypeid, subjectid, markerid,
> allele1id, allele2id, datecreated, datereplaced,
> ignore, inconsistent, sourcetablename, sourceid)
> TO stdout;
>
Like Scott saids, try to run explain analyze for those querys, then
post the results.
>
> Can anybody suggest anything that I can do to gain speed?
> Any help is much appreciated.
>
I recommend follow this thread:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php
> Regards,
>
> Tena Sakai
> tsakai(at)gallo(dot)ucsf(dot)edu
>
>
>
--
Emanuel Calvo Franco
Sumate al ARPUG !
(www.postgres-arg.org -
www.arpug.com.ar)
ArPUG / AOSUG Member
Postgresql Support & Admin
From | Date | Subject | |
---|---|---|---|
Next Message | Emanuel Calvo Franco | 2009-02-20 20:11:44 | Fwd: very, very slow performance |
Previous Message | Scott Marlowe | 2009-02-20 16:14:09 | Re: Crash with data corruption under Windows |