Re: Slow query - lots of temporary files.

From: Johann Spies <johann(dot)spies(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow query - lots of temporary files.
Date: 2015-06-12 15:04:48
Message-ID: CAGZ55DR2T6PVA8Zf4R-G5VXz25Q4VLGXVATJfWL2y0hwGD40CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10 June 2015 at 16:50, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

>
>
> The problematic piece of the explain plan is this:
>
> -> Merge Join (cost=4384310.92..21202716.78 rows=6664163593
> width=390)"
> Output: a.ut, c.gt, b.go, b.gn, d.au"
> Merge Cond: ((c.ut)::text = (d.rart_id)::text)"
>
> That is, the planner expects ~6.7 billion rows, each ~390B wide. That's
> ~2.5TB of data that needs to be stored to disk (so that the sort can
> process it).
>
> The way the schema is designed might be one of the issues - ISTM the 'ut'
> column is somehow universal, mixing values referencing different columns in
> multiple tables. Not only that's utterly misleading for the planner (and
> may easily cause issues with huge intermediate results), but it also makes
> formulating the queries very difficult. And of course, the casting between
> text and int is not very good either.
>
> Fix the schema to follow relational best practices - separate the values
> into multiple columns, and most of this will go away.
>

Thanks for your reply Tomas.

I do not understand what the problem with the 'ut' column is. It is a
unique identifier in the first table(africa_uts) and is used in the other
tables to establish joins and does have the same type definition in all the
tables. Is the problem in the similar name. The data refers in all the
'ut' columns of the different tables to the same data. I do not casting of
integers into text in this case. I don't know why the planner is doing
it. The field 'rart_id' in isi.rauthor is just another name for 'ut' in
the other tables and have the same datatype.

I do not understand your remark: "separate the values into multiple
columns". I cannot see which values can be separated into different columns
in the schema. Do you mean in the query? Why?

Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Sheena, Prabhjot 2015-06-12 17:57:22 pg bouncer issue what does sv_used column means
Previous Message Sasa Vilic 2015-06-12 00:18:29 Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated