Wei Wang,
> How exactly slow is DISTINCT being processed in SQL engines? (not
> limited to postgresql, though comments on postgresql would be most
> relevant)
I can only give you a relative result, based exlusively on my anecdotal
experience with 7.1:
Fast: SELECT ...
Slower: SELECT ... GROUP BY x,y,z
or: SELECT DISCTINCT ON (x) ... (Postgres non-standard extension)
SLowest: SELECT DISTINCT ...
The reason for this is that SELECT DISTINCT is effectively a GROUP BY
on all result fields of the query, and if a few of the aren't indexed
that requires a seq scan.
If performance is an issue, you may wish to consider restructuring your
queries and/or data model to eliminate the actual duplicate rows.
-Josh