Re: JOIN issues (Left vs Right for sorting), and "Nested Loop" problem

From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: JOIN issues (Left vs Right for sorting), and "Nested Loop" problem
Date: 2007-09-01 12:13:02
Message-ID: 5611403A-BBDC-48D0-B818-DC0EA123DAD1@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Sep 1, 2007, at 11:46, Phoenix Kiula wrote:

> Hello,
>
> I have a simple query as follows. It joins two very straightforward
> tables.
>
>
> SELECT
> trades.id,
> trades.url,
> trades.alias,
> tradecount.t_count,
> tradecount.u_count
> FROM trades
> LEFT JOIN tradecount ON trades.id = tradecount.id
> WHERE trades.user_id = 'jondoe' and trades.status = 'Y'
> ORDER BY
> tradecount.u_count desc
> OFFSET 20 LIMIT 10
>
>
> Both the tables have a bigint "id" field that connects them. The table
> definitions are included below:
>
>
>
>
> Table "public.trades"
>
> Column | Type |
> Modifiers
> -----------------------+-----------------------------
> +------------------------------
> id | bigint | not null
> user_id | character varying(45) | not null
> url | text | not null
> alias | character varying(20) | not null
> title | character varying(500) |
> private | character(1) |
> status | character(1) | default
> 'Y'::bpchar
> modify_date | timestamp without time zone |
> disable_in_statistics | character(1) | not null
> default 'N'::bpchar
> Indexes:
> "trades_pkey" PRIMARY KEY, btree (id)
> "trades_unique_alias" UNIQUE, btree (alias)
> "idx_trades_mdate" btree (modify_date)
> "idx_trades_userid" btree (user_id)
> Check constraints:
> "trades_alias_valid" CHECK (alias::text ~ '[-A-Za-z0-9_]'::text)
> "trades_id_check" CHECK (id > 0)
> "trades_url_check" CHECK (url <> ''::text)
> "trades_user_id_check" CHECK (user_id::text <> ''::text)
>
>
>
>
>
> Table "public.tradecount"
>
> Column | Type | Modifiers
> --------------+-----------------------------+--------------------
> id | bigint | not null
> t_count | integer | not null default 0
> u_count | integer | not null default 0
> modify_date | timestamp without time zone | default now()
> Indexes:
> "tradecount_pkey" PRIMARY KEY, btree (id)
> "i_tradecount_uc" btree (u_count)
> "i_tradecount_vc" btree (t_count)
> Foreign-key constraints:
> "fk_tradecount_trades_id" FOREIGN KEY (id) REFERENCES trades(id)
> ON DELETE CASCADE
> Rules:
> replace_tradecount_on_duplicate_insert AS
> ON INSERT TO tradecount
> WHERE (EXISTS ( SELECT 1
> FROM tradecount
> WHERE tradecount.id = new.id)) DO INSTEAD UPDATE tradecount
> SET t_count = tradecount.t_count, u_count = tradecount.u_count
> WHERE tradecount.id = new.id
>
>
>
>
> Now I have two problems:
>
>
> 1. The above query takes more time to fire up that an index should
> really take. I have bitmap heap scan off in conf file, and indexscan
> on, otherwise this was going into a bitmap heap thing.
>
> As you will see from the SQL above, the trades.user_id index should be
> limiting the number of rows to a few hundred (or thousand at max) and
> then we are trying to get only 10 tuples based on the OFFSET and LIMIT
> clauses.
>
> However, there's a nested loop in there as the EXPLAIN ANALYZE shows
> below. What is causing this nested loop?

It looks like it's used to match trades to tradecounts. I think that
makes sense, as the number of matching records from both tables isn't
necessarily equal. The query is looping over trades until each
tradecount has all its trades (for user 'jondoe' with status 'Y')
associated.

It is kind of confusing that you're using the id column in
tradecounts for both primary key and foreign key, and I'm not sure
what that implies to the query planner. It suggests that there can be
only (up to) one tradecounts record for each trade count, but it
appears that either the planner doesn't realise that...

Is 10 ms problematic for this query?

> QUERY PLAN
> ----------------------------------------------------------------------
> --------------------------------------------------
> Limit (cost=4829.70..4829.73 rows=10 width=125) (actual
> time=9.784..9.835 rows=10 loops=1)
> -> Sort (cost=4829.65..4830.61 rows=385 width=125) (actual
> time=9.703..9.757 rows=30 loops=1)
> Sort Key: tradecount.u_count
> -> Nested Loop Left Join (cost=0.00..4813.12 rows=385
> width=125) (actual time=0.075..8.662 rows=386 loops=1)
> -> Index Scan using idx_trades_userid on trades
> (cost=0.00..1556.08 rows=385 width=117) (actual time=0.05
> 0..1.225 rows=386 loops=1)
> Index Cond: ((user_id)::text = 'jondoe'::text)
> Filter: (status = 'Y'::bpchar)
> -> Index Scan using tradecount_pkey on tradecount
> (cost=0.00..8.45 rows=1 width=16) (actual time=0.006.
> .0.008 rows=1 loops=386)
> Index Cond: (trades.id = tradecount.id)
> Total runtime: 9.963 ms
> (10 rows)
>
>
>
>
> 2. Secondly, if I want to sort the join by a column on the second
> table, then the rows returned are not really sorted unless I do a
> RIGHT JOIN (my sql above shows a LEFT JOIN). Getting results from a
> right join is fine as long as the column is not null in the second
> table, but if it is null, then nothing is returned. This is why I do a
> LEFT join in the first place! So my question: how can I do a left
> join, which is the logic that I wish to accomplish, but get the
> sorting to work from the second table and if a column is null then
> just return as 0 instead of nothing at all? (The LEFT JOIN used to
> work in Mysql).

That's very odd, the right join should work fine.
You constrain tradecounts to require a matching record in trades, so
a right join with tradecounts can not return NULL values for columns
in trades; Except where status is null (which is possible), in which
case the record doesn't match your WHERE-clause. Are you sure you
don't have NULL values for statuses?

You don't say what "not really sorted" means in your left joins, but
I expect the rows with NULL values for u_count to be grouped together
at the top (in no particular order, you didn't specify any other
order than by u_count) and the rest ordered by u_count as expected.

You could use ORDER BY COALESCE(tradecount.u_count, 0) desc if you
want it to behave like you say mysql sorted it.

Apparently mysql treats NULL values as 0 when ordering? Or do they
just order them first instead of last like PG does (which is just a
matter of preference, really)?

You should realize that NULL means 'unknown', so theoretically you
could encounter databases that put them at "random" places in your
otherwise sorted result set, not touching their position among other
records because they can't know how to compare them.

If you want certain behaviour from NULL values you'll need to specify
what you want or expect surprises ;)

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

!DSPAM:737,46d95276289901944772347!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2007-09-01 12:24:25 Re: Obtaining random rows from a result set
Previous Message Andrus 2007-09-01 11:12:19 Reporting services for PostgreSQL