The query plan get all columns but I'm using only one column.

From: Moises Lopez <moylop260(at)vauxoo(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: The query plan get all columns but I'm using only one column.
Date: 2020-04-24 21:11:12
Message-ID: CAGro9RXViOXNBBA-SZur_UbpWrbkg_Ow4ha13FJu9dOJ=WdB7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We have production database that has slow queries because of the query get
all columns even if I'm using only one column.
The result is slow for tables that there are too much columns
The weird part is that there is environment that I can't reproduce it even
if they are using the same postgresql.conf
I didn't find what is the variant/configuration to avoid it
I could reproduce it using the official docker image of postgresql

* Steps to reproduce it

1. Run the following script:
docker run --name psql1 -d -e POSTGRES_PASSWORD=pwd postgres
docker exec -it --user=postgres psql1 psql
# Into docker container
CREATE DATABASE db;
\connect db;
CREATE TABLE link (
ID serial PRIMARY KEY,
url VARCHAR (255) NOT NULL,
name VARCHAR (255) NOT NULL,
description VARCHAR (255),
rel VARCHAR (50)
);
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT l1.url
FROM link l1
JOIN link l2
ON l1.url=l2.url;

2. See result of the Query Plan:
QUERY PLAN

-------------------------------------------------------------------------------------------
Hash Join (cost=10.90..21.85 rows=40 width=516) (actual
time=0.080..0.081 rows=1 loops=1)
Output: l1.url
Hash Cond: ((l1.url)::text = (l2.url)::text)
Buffers: shared hit=5
-> Seq Scan on public.link l1 (cost=0.00..10.40 rows=40 width=516)
(actual time=0.010..0.011 rows=1 loops=1)
* Output: l1.id <http://l1.id>, l1.url, l1.name
<http://l1.name>, l1.description, l1.rel*
Buffers: shared hit=1
-> Hash (cost=10.40..10.40 rows=40 width=516) (actual
time=0.021..0.021 rows=1 loops=1)
Output: l2.url
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on public.link l2 (cost=0.00..10.40 rows=40
width=516) (actual time=0.010..0.011 rows=1 loops=1)
Output: l2.url
Buffers: shared hit=1
Planning Time: 0.564 ms
Execution Time: 0.142 ms

3. Notice that I'm using only the column "url" for "JOIN" and "SELECT"
section,
but the "Output" section is returning all columns.

Is there a manner to avoid returning all columns in order to get a better
performance?

Thank you in advance

* PostgreSQL version:

psql postgres -c "SELECT version()"
PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

Changes made to the settings in the postgresql.conf file: see Server
Configuration for a quick way to list them all.
without changes

Operating system and version:
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

--
Moisés López
@moylop260

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2020-04-24 22:59:41 Re: PostgreSQL does not choose my indexes well
Previous Message Tom Lane 2020-04-24 19:39:52 Re: PostgreSQL does not choose my indexes well