From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Tom Burke" <lists(at)spamex(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Updates are slow.. |
Date: | 2002-06-12 15:00:47 |
Message-ID: | 9762.1023894047@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
"Tom Burke" <lists(at)spamex(dot)com> writes:
> After creating this index, the update time jumped up again to 21 mins!
> real 21m3.536s
> user 0m0.010s
> sys 0m0.000s
> This makes no sense to me because the new index is not related to
> the column being updated at all.
Doesn't matter: we're entering a new tuple (new version of the row) so
new entries must be made in all indexes for the table.
> Actually, it is not unique.
> This is not really an emp table, and in fact ~600K of the email
> addresses are duplicates.
Ah so. The btree code doesn't really like huge numbers of duplicate
keys --- it still works, but the algorithms degenerate to O(N) instead
of O(log N). I imagine the previous two-column incarnation with
(email, dept_id) was also quite non-unique?
So the bottom line is not to bother with making indexes on highly
redundant columns; they aren't fast and they don't do you any good
anyway.
If the situation is that you've got, say, 600K empty email addresses and
another 600K that are actually useful, you might consider making a
partial index that excludes the empty addresses; then that could be used
to look up real addresses without paying the overhead to index the empty
ones. See
http://www.postgresql.org/users-lounge/docs/7.2/postgres/indexes-partial.html
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Manfred Koizar | 2002-06-12 15:08:40 | Re: Updates are slow.. |
Previous Message | Tom Lane | 2002-06-12 14:41:13 | Re: dynamic querys |