Re: optimalisation with EXCEPT clause

From: "Kincel, Martin" <MKincel(at)soitron(dot)com>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: optimalisation with EXCEPT clause
Date: 2010-04-15 08:57:42
Message-ID: A5ED43533E983E4685C9E6156BE8874F0840DCBA@kenya.tronet.as
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you for the answer Grzegorz.

> if you have a primary key on the table, and you should, you might get better performance using LEFT JOIN.

Well as far as I know, the result of such JOIN is a cartezian product, which is not exactly what I need. I need the same structure as table 'data' has. Or am I missing a trick how LEFT OUTER JOIN can be used instead of EXCEPT? :)

> EXCEPT will compare all columns, which might not be that fast, especially if those are text. (hence why I always tell others to use int as key in a table, but that's a different story). 

There is no good int to start using as a key in my 'data'. I would have to create one (out of some hash function, diggesting the whole row probably), but there is a strong possibility of adding colums into 'data' latter on, which would require recalculation of such 'hash' column over and over again for millions of rows. While not impossible, cerainly something I would like to avoid.
Moreover, if one creates and maintains such hash column by hand and on his own, it is very likely, that he will forgot something, or even mess it up completely. However, if there is a tool (something like an index on all colums) available in the database itself, I would be eager to use it.

Thanks again,
Winco

> --
> GJ

* From: "Kincel, Martin" <MKincel(at)soitron(dot)com>
* To: <pgsql-general(at)postgresql(dot)org>
* Subject: optimalisation with EXCEPT clause
* Date: Tue, 13 Apr 2010 17:01:18 +0200
* Message-id: <A5ED43533E983E4685C9E6156BE8874F0840D83D(at)kenya(dot)tronet(dot)as>

Hello,

everyday I collect a couple of thousands rows of unique data from our
systems and I INSERT them into the table. Since I need no duplicate
data, I use EXCEPT clause when INSERTing, like this:

===
INSERT INTO data SELECT * FROM new_collected_data() EXCEPT SELECT * FROM
data;
===

It works exactly as I need, but there is a small issue I am thinking
about how to improve. Yes it's performance, what else? :)

Since I am INSERTing new_collected_data() in 10000-rows chunks into a
table already containing millions of rows, it takes a few minutes
(literally), which is something I understand and accept.
However, I am wondering whether there is any way how to improve the
performance, either via indices, or ALTERing TABLE with UNIQUE
constraint or something else I might have completely forgot about.

Does anyone have any recommended approach how to speed up queries
containing EXCEPT clause?

Thanks a lot,
Winco

In response to

Browse pgsql-general by date

  From Date Subject
Next Message dipti shah 2010-04-15 09:34:28 Re: How to get whether user has ALL permissions on table?
Previous Message A. Kretschmer 2010-04-15 07:57:25 Re: How to get whether user has ALL permissions on table?