Partial index creation always scans the entire table

From: MingJu Wu <mingjuwu0505(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Partial index creation always scans the entire table
Date: 2020-02-15 11:04:48
Message-ID: CAKVFrvFY-f7kgwMRMiPLbPYMmgjc8Y2jjUGK_Y0HVcYAmU6ymg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

When creating partial indexes, can postgres utilize another index for
figuring which rows should be included in the partial index, without
performing a full table scan?

My scenario is that I have a table with 50M rows that are categorized into
10K categories. I need to create a partial index for each category. I have
created a index on the category column, hoping that postgres can use this
information when creating the partial indexes. However, postgres always
performs full table scan.

I've tested with PostgreSQL 12.2. Below is an example setup showing the
problem.

TEST1 shows that building a full index covering all rows takes 18 seconds.

TEST2 shows that creating a partial index for one of the category1
(category1=1) takes 3 seconds. This means that for creating 10K partial
indexes for each category, it will take over 8 hours. Compared to just 18
seconds in TEST1, it is much longer due to repeated full table scans.

TEST3 shows that even with another index (index_category2 created in SETUP)
covering category2, creating a partial index for one of the category2
(category2=1) still takes 3 seconds. I think postgres is still doing a full
table scan here.

My question is: can postgres utilize index_category2 is TEST3?

Thank you.

---------
-- SETUP
---------

CREATE TABLE test_data (
id bigint PRIMARY KEY,
category1 bigint,
category2 bigint
);

INSERT INTO test_data(id, category1, category2)
SELECT id, category, category FROM (
SELECT
generate_series(1, 50000000) AS id,
(random()*10000)::bigint AS category
) q;
-- Query returned successfully in 1 min 47 secs.

CREATE INDEX index_category2 ON test_data(category2);
-- Query returned successfully in 32 secs 347 msec.

--------------
-- TEST1: CREATE FULL INDEX
--------------

CREATE INDEX index_full ON test_data(id);
-- Query returned successfully in 18 secs 713 msec.

--------------
-- TEST2: CREATE PARTIAL INDEX, using category1
--------------

CREATE INDEX index_partial_1 ON test_data(id) WHERE category1=1;
-- Query returned successfully in 3 secs 523 msec.

--------------
-- TEST3: CREATE PARTIAL INDEX, using category2
--------------

CREATE INDEX index_partial_2 ON test_data(id) WHERE category2=1;
-- Query returned successfully in 3 secs 651 msec.

--- END ---

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sergei Kornilov 2020-02-15 12:47:51 Re: Partial index creation always scans the entire table
Previous Message Pavel Stehule 2020-02-15 08:05:01 Re: proposal: schema variables