Re: Improvement for query planner? (no, not about count(*) again ;-))

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Tobias Völk <tobias(dot)voelk(at)t-online(dot)de>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Improvement for query planner? (no, not about count(*) again ;-))
Date: 2020-07-20 14:44:16
Message-ID: CA+bJJbxKjRPSwJGz8vj2Y_8Azmnkzs0Z8y-D7xuU+miy6v5+3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general

Tobias, 1st some etiquette stuff.

- You have replied just to me, directly. I'm CCing the list. Remember
to use reply all. Usual practice in the postgres lists is to reply to
the list and everyone involved in the thread ( doing reply all
achieves this normally ).

- It's not a biggie in this particular mail, but please do not
top-post, specially if you want to get answers on any complex
question. Trim unnecessary parts from the quoted text and reply below.
Having to scroll to a big chunk which includes even my signature is
not a thing I like.

On Mon, Jul 20, 2020 at 2:23 PM <tobias(dot)voelk(at)t-online(dot)de> wrote:
> I have tried the queries
> select name1 from games union select name2 from games
> but not
> select distinct name1 from games union
> select distinct name2 from games
> since it's just the same and easy for the optimizer to realize (I thought?)

There are several other things you have not done. You have not
provided any info ( statistics ) on your table, just the cardinality.
You have not provided any explain output, which is normally needed if
you really want people to help you.

On the subject, I'm not really sure both queries are identical or can
be optimized, but when one does bulk queries like that it is better to
help it. Your query is a corner case, a one of loading, and many
optimizers do not catch that things properly, as putting code for them
means increasing bug surface on a feature of dubious utility ( I,
personally, would prefer having to put your sample query manually than
risking optimizer bugs OR paying the price of the optimizer trying to
catch that on every query I send ).

Also, the optimizer may be catching many things, a explain output may
help to see what it's doing.

> I've given Postgres a few Gigs (I think 4?) as work_mem, having 16 GB in total. Still it's not using them.

This does not seem correct. work_mem is per operation, it can be used
several times on a single query. See
https://www.postgresql.org/docs/12/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY
. Again, some explain/show combos may clear up things.

> Seems like my mistake was creating that table with a primary key. But the query itself without inserting into anything should've been fast then, which it wasn't. I'll remember your trick of creating the primary key afterwards and will try just select name1 from games to see how it goes.

The PK stuff is bulk-loading 101. Try explain AND explain analyze of
some variants, remember to analyze your tables ( seems redundant, but
the PK & redundant hash key stuff leads me to think you are not too
experienced on postgres usage ).

Francisco Olarte.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-07-20 17:58:19 Re: Improvement for query planner? (no, not about count(*) again ;-))
Previous Message Jehan-Guillaume de Rorthais 2020-07-20 12:09:24 Re: Buffers from parallel workers not accumulated to upper nodes with gather merge

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2020-07-20 14:45:31 Re: Logical replication from 11.x to 12.x and "unique key violations"
Previous Message Christophe Pettus 2020-07-20 14:34:57 Re: How to restore a dump containing CASTs into a database with a new user?