From: | <mallah(at)trade-india(dot)com> |
---|---|
To: | <josh(at)agliodbs(dot)com> |
Cc: | <mallah(at)trade-india(dot)com>, <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Why is seq search preferred here by planner? |
Date: | 2003-04-23 17:54:45 |
Message-ID: | 1286.219.65.233.8.1051120485.squirrel@mail.trade-india.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
> Mallah,
>
>> Hmm i am not running postgresql on a CRAY :-)
>>
>> that was the time for "begin work;" since
>> i am explain analysing an update i am putting it in transaction.
>>
>> the actualt time was nearing 300 secs which is 5 mins
>>
>> i did an explain of the UPDATE FROM variant of the query it has the same plan for it.
>
> It's the same query. When you forget the FROM, postgres tries to insert it for you ... in
> fact, you generally get a notice like "Inserting missing FROM clause for table
> 'personal_account_details'".
Josh when i get that such Notices the result of update is usually worng.
I do not get that notice for the SQL i posted.
>
> Am I reading your anaylze right?
i think u may have missed parts of it.
I'm never 100% sure with 7.2 ... you *are* updating 150,000
> rows?
7.2 u mean , pgsql 7.2 ? No i am running pgsql 7.3.2
yes i did update 1,50,000 rows that time.
>
> If I'm right, then the query is using a seq scan because it's faster than an index scan for a
> large number of rows. You can always test this by runninng the query after an SET
> ENABLE_SEQSCAN = FALSE; generally the resulting query will take 5-10 times as long.
Yes as explained in docs (somewhere ) and by many ppl seq scan is faster when
majority of the rows are stisfy the search criteria.
my botheration was that if pgsql were to search the personal_account_details with userid
(which is incidently the pkey there) why would it not use the uniq index ?
logically the query shud be looking the coutry feild from the personal_account_details
for every userid (row) encoutered in email_bank table.
may be i am wrong here in correctly under standing the explain output.
tradein_clients=# explain UPDATE email_bank set country=personal_account_details.country FROM
personal_account_details where email_bank.userid > 0 and
email_bank.userid=personal_account_details.userid and ( email_bank.country <>
personal_account_details.country or email_bank.country IS NULL ); QUERY PLAN
----------------------------------------------------------------------------------------------
Hash Join (cost=14497.06..37806.55 rows=186849 width=144)
Hash Cond: ("outer".userid = "inner".userid)
Join Filter: (("outer".country <> "inner".country) OR ("outer".country IS NULL))
-> Seq Scan on email_bank (cost=0.00..16268.10 rows=216658 width=130)
Filter: (userid > 0)
-> Hash (cost=14113.45..14113.45 rows=153445 width=14)
-> Seq Scan on personal_account_details (cost=0.00..14113.45 rows=153445 width=14)
(7 rows)
I am bothered abt the second seq scan , sorry i do not understand "Hash" is there any docs that
explain me that ?
>
> Your query delay time is updating 150,000 rows, period. That's a *lot* of disk activity, and
> it can only be improved by adjusting your postgresql.conf, the disk locations of your files,
> the speed of your disk array, and your I/O bandwidth.
my WAL logs (pg_xlog) are already in a dedicated disk [ buts thats a different topic ]
>
> How long does this take?
>
> UPDATE email_bank set country=personal_account_details.country
> FROM personal_account_details
> where userid > 0 and userid=personal_account_details.userid
> and email_bank.country <> personal_account_details.country
> or email_bank.country IS NULL;
Yes It took very less , because there were actually very less rows to update .
I agree the "email_bank.country <> personal_account_details.country or email_bank.country IS NULL"
was a neat thing to do in last query which i didnt' :-(
this time since most of the rows were already equal becoz i updated them a little while
back with that 5 mins query ;-)
anyway could u plez explain the second "seq scan part" ?
HERE WAS THE EXPLIAN ANALYZE OUTPUT:
hey dont you think the parentheis i added were necessary for the query
to update proper rows ? becoz the "or" part could have evaluated to true even for
no matching userids resulting in update of unecessary rows.
tradein_clients=# begin work ; explain analyze UPDATE email_bank set
country=personal_account_details.country FROM personal_account_details where email_bank.userid > 0
and email_bank.userid=personal_account_details.userid and ( email_bank.country <>
personal_account_details.country or email_bank.country IS NULL );BEGIN
Time: 720.04 ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=14497.06..37806.55 rows=186849 width=144) (actual time=5874.10..8754.98
rows=1033 loops=1) Hash Cond: ("outer".userid = "inner".userid)
Join Filter: (("outer".country <> "inner".country) OR ("outer".country IS NULL))
-> Seq Scan on email_bank (cost=0.00..16268.10 rows=216658 width=130) (actual
time=0.83..1361.35 rows=156669 loops=1) Filter: (userid > 0)
-> Hash (cost=14113.45..14113.45 rows=153445 width=14) (actual time=5855.59..5855.59 rows=0
loops=1) -> Seq Scan on personal_account_details (cost=0.00..14113.45 rows=153445 width=14)
(actual time=0.04..5550.93 rows=153466 loops=1) Total runtime: 9370.74 msec
(8 rows)
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
>
> ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9'
> the postmaster
-----------------------------------------
Get your free web based email at trade-india.com.
"India's Leading B2B eMarketplace.!"
http://www.trade-india.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2003-04-23 17:55:36 | Re: OUTER JOIN |
Previous Message | Rod Taylor | 2003-04-23 17:48:38 | Re: Optomizing left outer joins |