From: | Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> |
---|---|
To: | Chris Smith <cdsmith(at)twu(dot)net> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Interpreting query plan |
Date: | 2004-07-06 16:53:34 |
Message-ID: | 20040706093945.H13939@megazone.bigpanda.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 6 Jul 2004, Chris Smith wrote:
> Thanks for your reply. Here is the output of "explain analyze". I also
> replaced by simple values with a real query that gets run and (according to
> our profiling) takes a long time. However, the query is now taking a much
> shorter period of time than it was profiled at by the application. I can only
> guess the difference has something to do with system load. I could try to run
> this again during a high-load period, if that is necessary. Let me know.
It might help.
Notes and questions about the query and the plan below inline.
> miqweb=> explain analyze select distinct t0.* from UserAccount t0, UserMapping
> t1 where (t0.companyid = 628) and ((t0.companyid = 628) and (t0.userid =
> t1.use
> rid) and (t1.groupid in
> (628,948,949,950,951,953,954,1272,1279,1296,1299,1300,1
> 363,1423,1446,1467,1526,1724,1735,1759,1763,1772,1785,1841,1862,1975,2721,2800
> ,
> 2801,2802,2803,1264,1394,1525,1662,1843,1844,1845,1396,1528,1860,1846,1762,242
> 2
> ,1271,1847,1848,1281,1849,1850,1851,1266,1809,1852,1853,2421,1854,1855,1913,18
> 5
> 6,1857,1269,1268,1858,1859,2804))) and (t0.companyid = 628);
I wonder if in practice this query uses distinct to get around a problem
with subqueries. I'd think that a query of the general form:
select t0.* from UserAccount t0 where t0.companyid=62 and t0.userid in
(select userid from UserMapping t1 where t1.groupid in (...));
might work better (although it doesn't look like the extra steps are that
big in practice). For systems that don't do in well (older PostgreSQL
versions included), perhaps an exists would be better.
> Unique (cost=952.15..959.37 rows=289 width=55) (actual time=137.130..143.363
> r
> ows=752 loops=1)
> -> Sort (cost=952.15..952.87 rows=289 width=55) (actual
> time=137.123..138.0
> 04 rows=1328 loops=1)
> Sort Key: t0.userid, t0.companyid, t0.username, t0."password",
> t0.isact
> ive, t0.isregistered, t0.lastlogin, t0.firstname, t0.lastname
These steps are for the distinct. It's not alot of actual time, but if
the row set returned was large enough to exceed sort_mem the sort might
start going off to disk and be slower.
> -> Hash Join (cost=869.15..940.34 rows=289 width=55) (actual
> time=112
> .112..130.948 rows=1328 loops=1)
> Hash Cond: ("outer".userid = "inner".userid)
> -> Seq Scan on useraccount t0 (cost=0.00..55.71 rows=629
> width=
> 55) (actual time=0.239..8.501 rows=753 loops=1)
> Filter: (companyid = 628)
The row estimate is pretty reasonable, estimated 629 versus actual 753.
How many rows are in useraccount? I'm wondering if 629 is a reasonable
percentage of the rows to see if seq scan is reasonable here.
> -> Hash (cost=866.28..866.28 rows=1151 width=4) (actual
> time=11
> 1.762..111.762 rows=0 loops=1)
> -> Seq Scan on usermapping t1 (cost=0.00..866.28
> rows=115
> 1 width=4) (actual time=4.251..109.563 rows=1328 loops=1)
> [lots of filter conditions on groupid]
Here the estimate isn't so good, estimated 115 vs actual 1328. You might
want to consider raising the groupid column's statistics target and
re-analyzing to see if you can get a better estimate. It won't probably
directly affect this plan entry, but it might affect the layers above.
Also, how many rows are in usermapping?
I didn't see any indexes on usermapping. Perhaps an index on (userid,
groupid) would help as well.
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2004-07-06 17:24:12 | Re: alter table cascade does not give notice about dropped |
Previous Message | Jan Wieck | 2004-07-06 16:15:08 | Re: incremental backups? |