From: | Madison Kelly <linux(at)alteeve(dot)com> |
---|---|
To: | PgSQL General List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Question on a select |
Date: | 2005-01-02 07:55:15 |
Message-ID: | 41D7A8E3.1050507@alteeve.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Bruno Wolff III wrote:
> There should be parenthesis around the list to test.
> WHERE a_name, a_type, a_dir NOT IN (
> should be
> WHERE (a_name, a_type, a_dir) NOT IN (
That did it (I think)!
>
> I believe that the NOT IN query should run comparably to the LEFT JOIN
> example supplied by the other person (at least in recent versions of
> Postgres). I would expect this to run faster than using NOT EXISTS.
> You probably want to try all 3. The semantics of the three ways of doing
> this are not all equivalent if there are NULLs in the data being used
> to eliminate rows. As you indicated you don't have NULLs this shouldn't
> be a problem.
>
> Another way to write this is using set different (EXCEPT or EXCEPT ALL)
> using the key fields and then joining back to table a to pick up the
> other fields. However this will almost certianly be slower than the
> other methods.
Something odd, now that I have the other method working (I think)...
tle-bu=> SELECT a.file_name, a.file_parent_dir, a.file_type FROM
file_info_1 a LEFT JOIN file_set_1 b ON a.file_name=b.fs_name AND
a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type WHERE
b.fs_name IS NULL;
returns the results in roughly 1 or 2 seconds on a test data set of
15,000 entries. I have an index on both 'file_info_1' covering
'fs_name', 'fs_parent_dir' and 'fs_type' and on 'file_set_1' covering
'file_name', 'file_parent_dir' and 'file_type'. When I try the seconds
method though:
tle-bu=> SELECT file_name, file_parent_dir, file_type FROM file_info_1
WHERE (file_name, file_parent_dir, file_type) NOT IN (SELECT fs_name,
fs_parent_dir, fs_type FROM file_set_1);
It took so long to process that after roughly three minutes I stopped
the query for fear of overheating my laptop (which happend a while back
forcing a thermal shut down).
The indexes are:
CREATE INDEX file_info_#_display_idx ON file_info_# (file_type,
file_parent_dir, file_name);
CREATE INDEX file_set_#_sync_idx ON file_set_# (fs_name, fs_parent_dir,
fs_type)
Are these not effective for the second query? If not, what should I
change or add? If so, would you have any insight into why there is such
an incredible difference in performance?
Thanks very much again!!
Madison
From | Date | Subject | |
---|---|---|---|
Next Message | Vincent Hikida | 2005-01-02 09:52:05 | Re: Question on a select |
Previous Message | Bruno Wolff III | 2005-01-02 07:52:49 | Re: Question on a select |