Avoid sorting when doing an array_agg

From: Alexis Woo <awoo2611(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Avoid sorting when doing an array_agg
Date: 2016-12-02 17:25:07
Message-ID: CAPJ98W5OBZYerh1N-u1G1q_SL7KeCVu7KY6ojDn1bEyFjLfRuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a users table which contains ~70 million rows that looks like this:

Column | Type |
-------------+-------------------+
id | integer |
first_name | character varying |
last_name | character varying |
category_id | integer |
Indexes:
"users_id_idx" btree (id)
"users_category_id_first_name_last_name_idx" btree (category_id,
first_name, last_name)

I'm trying to retrieve the ids for each (first_name, last_name) couple for
one specific category_id.
The query that I'm currently doing is the following:

select array_agg(id)
from users
where category_id = 5432
group by first_name, last_name;

For which the explain analyze output is the following:

GroupAggregate (cost=618461.35..626719.42 rows=26881 width=19) (actual
time=1683.139..2613.386 rows=102943 loops=1)
Group Key: first_name, last_name
-> Sort (cost=618461.35..620441.86 rows=792206 width=19) (actual
time=1683.116..2368.904 rows=849428 loops=1)
Sort Key: first_name, last_name
Sort Method: external merge Disk: 25304kB
-> Bitmap Heap Scan on users (cost=26844.16..524595.92
rows=792206 width=19) (actual time=86.046..229.469 rows=849428 loops=1)
Recheck Cond: (category_id = 5432)
Heap Blocks: exact=7938
-> Bitmap Index Scan on
users_category_id_first_name_last_name_idx (cost=0.00..26646.11
rows=792206 width=0) (actual time=85.006..85.006 rows=849428 loops=1)
Index Cond: (category_id = 5432)

What seems to greatly decrease the performance of the query is the "Sort
Method: external merge Disk: 7526kB."

Is it possible to aggregate the ids without doing a sort ?
If not, what other options, apart from increasing the work_mem, do I have ?

Thanks,

Alexis

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom DalPozzo 2016-12-02 17:40:14 INSERT - UPDATE throughput oscillating and SSD activity after stopping the client
Previous Message Andreas Joseph Krogh 2016-12-02 17:05:59 Re: Full text search randomly not working for short prefixes?