Re: how to improve perf of 131MM row table?

From: AJ Weber <aweber(at)comcast(dot)net>
To: Shaun Thomas <sthomas(at)optionshouse(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: how to improve perf of 131MM row table?
Date: 2014-06-26 14:22:59
Message-ID: 53AC2CC3.9060403@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I sent the details as identified by pgAdmin III.

psql output shows this:
\d alf_node
Table "public.alf_node"
Column | Type | Modifiers
----------------+------------------------+-----------
id | bigint | not null
version | bigint | not null
store_id | bigint | not null
uuid | character varying(36) | not null
transaction_id | bigint | not null
node_deleted | boolean | not null
type_qname_id | bigint | not null
locale_id | bigint | not null
acl_id | bigint |
audit_creator | character varying(255) |
audit_created | character varying(30) |
audit_modifier | character varying(255) |
audit_modified | character varying(30) |
audit_accessed | character varying(30) |
Indexes:
"alf_node_pkey" PRIMARY KEY, btree (id) CLUSTER
"store_id" UNIQUE, btree (store_id, uuid)
"fk_alf_node_acl" btree (acl_id)
"fk_alf_node_loc" btree (locale_id)
"fk_alf_node_store" btree (store_id)
"fk_alf_node_tqn" btree (type_qname_id)
"fk_alf_node_txn" btree (transaction_id)
"idx_alf_node_del" btree (node_deleted)
"idx_alf_node_txn_del" btree (transaction_id, node_deleted)
Foreign-key constraints:
"fk_alf_node_acl" FOREIGN KEY (acl_id) REFERENCES
alf_access_control_list(id)
"fk_alf_node_loc" FOREIGN KEY (locale_id) REFERENCES alf_locale(id)
"fk_alf_node_store" FOREIGN KEY (store_id) REFERENCES alf_store(id)
"fk_alf_node_tqn" FOREIGN KEY (type_qname_id) REFERENCES alf_qname(id)
"fk_alf_node_txn" FOREIGN KEY (transaction_id) REFERENCES
alf_transaction(id)
Referenced by:
TABLE "alf_child_assoc" CONSTRAINT "fk_alf_cass_cnode" FOREIGN KEY
(child_node_id) REFERENCES alf_node(id)
TABLE "alf_child_assoc" CONSTRAINT "fk_alf_cass_pnode" FOREIGN KEY
(parent_node_id) REFERENCES alf_node(id)
TABLE "alf_node_aspects" CONSTRAINT "fk_alf_nasp_n" FOREIGN KEY
(node_id) REFERENCES alf_node(id)
TABLE "alf_node_assoc" CONSTRAINT "fk_alf_nass_snode" FOREIGN KEY
(source_node_id) REFERENCES alf_node(id)
TABLE "alf_node_assoc" CONSTRAINT "fk_alf_nass_tnode" FOREIGN KEY
(target_node_id) REFERENCES alf_node(id)
TABLE "alf_node_properties" CONSTRAINT "fk_alf_nprop_n" FOREIGN KEY
(node_id) REFERENCES alf_node(id)
TABLE "alf_store" CONSTRAINT "fk_alf_store_root" FOREIGN KEY
(root_node_id) REFERENCES alf_node(id)
TABLE "alf_subscriptions" CONSTRAINT "fk_alf_sub_node" FOREIGN KEY
(node_id) REFERENCES alf_node(id) ON DELETE CASCADE
TABLE "alf_subscriptions" CONSTRAINT "fk_alf_sub_user" FOREIGN KEY
(user_node_id) REFERENCES alf_node(id) ON DELETE CASCADE
TABLE "alf_usage_delta" CONSTRAINT "fk_alf_usaged_n" FOREIGN KEY
(node_id) REFERENCES alf_node(id)

This line of the output:
"alf_node_pkey" PRIMARY KEY, btree (id) CLUSTER
would indicate to me that there is a PK on alf_node table, it is on
column "id", it is of type btree, and the table is clustered around that
index.

Am I reading this totally wrong?

The supporting table actually seems to have a multi-column PK defined,
and a separate btree index on node_id as you mentioned.

-AJ

On 6/26/2014 10:05 AM, Shaun Thomas wrote:
> On 06/26/2014 08:26 AM, AJ Weber wrote:
>
>> The "master table" definition is attached as "table1.sql".
>> The "detail table" definition is attached as "table2.sql".
>
> I'm not sure what you think a primary key is, but neither of these
> tables have one. Primary keys are declared one of two ways:
>
> CREATE TABLE foo
> (
> id BIGINT PRIMARY KEY,
> col1 VARCHAR,
> col2 INT
> );
>
> Or this:
>
> CREATE TABLE foo
> (
> id BIGINT,
> col1 VARCHAR,
> col2 INT
> );
>
> ALTER TABLE foo ADD constraint pk_foo PRIMARY KEY (id);
>
> On your alf_node_properties table, you only have an index on node_id
> because you created one. If you look at your alf_node table, there is
> no index on the id column at all. This is confirmed by the explain
> output you attached:
>
> Seq Scan on alf_node node (cost=0.00..227265.29 rows=5733429
> width=16) (actual time=0.013..2029.649 rows=5733888 loops=1)
>
> Since it has no index, the database is reading the entire table to
> find your matching values. Then it's using the index on node_id in the
> other table to find the 'detail' matches, as seen here:
>
> Bitmap Index Scan on fk_alf_nprop_n (cost=0.00..1240.00 rows=52790
> width=0) (actual time=0.552..0.552 rows=1071 loops=1)
>
> Add an actual primary key to your alf_node table, and your query
> performance should improve substantially. But I also strongly suggest
> you spend some time learning how to read an EXPLAIN plan, as that
> would have made your problem obvious immediately.
>
> Here's a link for your version:
>
> http://www.postgresql.org/docs/9.0/static/sql-explain.html
>
> You should still consider upgrading to the latest release of 9.0 too.
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2014-06-26 14:37:24 Re: how to improve perf of 131MM row table?
Previous Message Matheus de Oliveira 2014-06-26 14:14:27 Re: how to improve perf of 131MM row table?