From: | Peter Hunsberger <peter(dot)hunsberger(at)gmail(dot)com> |
---|---|
To: | Marcin Krol <mrkafk(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: The REAL cost of joins |
Date: | 2010-03-03 22:33:49 |
Message-ID: | cc159a4a1003031433r27c8de5erea85d0b841117b9@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Mar 3, 2010 at 3:59 PM, Marcin Krol <mrkafk(at)gmail(dot)com> wrote:
>
> Hello everyone,
>
> I have inadvertently set off a furball on an unrelated ng on what is the actual cost of SQL joins. But there's no reliable conclusion. I would like to actually know that, that is, are JOINs truly expensive?
>
> As they say, one measurement is worth a thousand opinions, so I've done measurement on my PG app:
>
> $ time echo "\c hrs;
> SELECT hosts.ip, reservation.start_date, architecture.architecture, os_kind.os_kind, os_rel.os_rel, os_version.os_version, project.project, email.email FROM hosts
> INNER JOIN project ON project.id = hosts.project_id
> INNER JOIN architecture ON hosts.architecture_id = architecture.id
> INNER JOIN os_kind ON os_kind.id = hosts.os_kind_id
> INNER JOIN os_rel ON hosts.os_rel_id = os_rel.id
> INNER JOIN os_version ON hosts.os_version_id = os_version.id
> INNER JOIN reservation_hosts ON hosts.id = reservation_hosts.host_id
> INNER JOIN reservation on reservation.id = reservation_hosts.reservation_id
> INNER JOIN email ON reservation.email_id = email.id
>
> ;" | psql > /dev/null
>
> real 0m0.099s
> user 0m0.015s
> sys 0m0.005s
>
>
>
> $ time echo "\c hrs;
> > SELECT hosts.ip FROM hosts;
> > SELECT reservation.start_date FROM reservation;
> > SELECT architecture.architecture FROM architecture;
> > SELECT os_rel.os_rel FROM os_rel;
> > SELECT os_version.os_version FROM os_version;
> > SELECT project.project FROM project;
> > SELECT email.email FROM email;
> > " | psql > /dev/null
>
> real 0m0.046s
> user 0m0.008s
> sys 0m0.004s
>
> Note: I've created indexes on those tables, both on data columns like hosts.ip and on .id columns.
>
> What do you think of this? And in general: when (if?) should one denormalize data?
I don't get what these two examples are supposed to show? They do
not, in general, create equivalent output data. If you had to write
procedural logic to do the same correlation between the individual
result sets as the joins are performing then you'd be back to at least
the same performance and probably worse, so what exactly is being
compared here?
As to your second question; well, outside of not doing premature
optimization I'll observe that for operational databases,
denormalization rarely gives me enough savings to justify the
resultant extra data management required to maintain it. If you're
talking about a data warehouse (and given the way you've worded the
question I supsect you are not?) then that's a different question, in
that case I prefer to generate the data from the operational side of
the house so you're no longer paying an operational cost to maintain
the denormalized data (you've got a whole different set of costs
instead).
--
Peter Hunsberger
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Graf | 2010-03-03 22:34:02 | Re: Scratching my head why results are different between machines. |
Previous Message | Bruce Momjian | 2010-03-03 22:30:01 | Re: [GENERAL] to_timestamp() and quartersf |