Slow join

From: Roman Kushnir <roman(dot)kushnir(at)ad2games(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow join
Date: 2018-06-25 15:55:49
Message-ID: 0B4D190F-CC48-4763-918D-4D4A8DAE5B0B@ad2games.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

The following basic inner join is taking too much time for me. (I’m using count(videos.id <http://videos.id/>) instead of count(*) because my actual query looks different, but I simplified it here to the essence).
I’ve tried following random people's suggestions and adjusting the random_page_cost(decreasing it from 4 to 1.1) without a stable improvement. Any hints on what is wrong here? Thank you.

The query

SELECT COUNT(videos.id) FROM videos JOIN accounts ON accounts.channel_id = videos.channel_id;

The accounts table has 744 rows, videos table has 2.2M rows, the join produces 135k rows.

Running on Amazon RDS, with default 10.1 parameters

version
---------------------------------------------------------------------------------------------------------
PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit

Execution plan https://explain.depesz.com/s/gf7 <https://explain.depesz.com/s/gf7>

Structure and statistics of the tables involved

=> \d videos
Table "public.videos"
Column | Type | Collation | Nullable | Default
------------------------+-----------------------------+-----------+----------+---------------------------------------------------
id | bigint | | not null | nextval('videos_id_seq'::regclass)
vendor_id | character varying | | not null |
channel_id | bigint | | |
published_at | timestamp without time zone | | |
title | text | | |
description | text | | |
thumbnails | jsonb | | |
tags | character varying[] | | |
category_id | character varying | | |
default_language | character varying | | |
default_audio_language | character varying | | |
duration | integer | | |
stereoscopic | boolean | | |
hd | boolean | | |
captioned | boolean | | |
licensed | boolean | | |
projection | character varying | | |
privacy_status | character varying | | |
license | character varying | | |
embeddable | boolean | | |
terminated_at | timestamp without time zone | | |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
featured_game_id | bigint | | |
Indexes:
"videos_pkey" PRIMARY KEY, btree (id)
"index_videos_on_vendor_id" UNIQUE, btree (vendor_id)
"index_videos_on_channel_id" btree (channel_id)
"index_videos_on_featured_game_id" btree (featured_game_id)
Foreign-key constraints:
"fk_rails_257f68ae55" FOREIGN KEY (channel_id) REFERENCES channels(id)
"fk_rails_ce1b3e10b0" FOREIGN KEY (featured_game_id) REFERENCES games(id)
Referenced by:
TABLE "video_fetch_statuses" CONSTRAINT "fk_rails_3bfdf013b8" FOREIGN KEY (video_id) REFERENCES videos(id)
TABLE "video_daily_facts" CONSTRAINT "fk_rails_dc0eca9ebb" FOREIGN KEY (video_id) REFERENCES videos(id)

=> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='videos’;

relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
-----------------------+----------+-------------+---------------+---------+----------+----------------+------------+---------------
videos | 471495 | 2.25694e+06 | 471389 | r | 24 | f | | 4447764480

=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='channel_id' AND tablename='videos' ORDER BY 1 DESC;

frac_mcv | tablename | attname | n_distinct | n_mcv | n_hist
----------+-----------------------+----------------+------------+-------+--------
0.1704 | videos | channel_id | 1915 | 100 | 101

=> \d accounts
Table "public.accounts"
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+--------------------------------------------------
id | bigint | | not null | nextval('accounts_id_seq'::regclass)
channel_id | bigint | | not null |
refresh_token | character varying | | not null |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
Indexes:
"accounts_pkey" PRIMARY KEY, btree (id)
"index_accounts_on_channel_id" UNIQUE, btree (channel_id)
"index_accounts_on_refresh_token" UNIQUE, btree (refresh_token)
Foreign-key constraints:
"fk_rails_11d6d9bea2" FOREIGN KEY (channel_id) REFERENCES channels(id)

=> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='accounts’;

relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
----------------------+----------+-----------+---------------+---------+----------+----------------+------------+---------------
accounts | 23 | 744 | 23 | r | 5 | f | | 229376

=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='channel_id' AND tablename='accounts' ORDER BY 1 DESC;

frac_mcv | tablename | attname | n_distinct | n_mcv | n_hist
----------+----------------------+----------------+------------+-------+--------
| accounts | channel_id | -1 | | 101

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2018-06-25 16:45:22 Re: Slow join
Previous Message Amit Kapila 2018-06-22 09:23:36 Re: add default parallel query to v10 release notes? (Re: [PERFORM] performance drop after upgrade (9.6 > 10))