Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2

From: Rodrigo Rosenfeld Rosas <rr(dot)rosas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Date: 2012-11-08 15:30:39
Message-ID: 509BD01F.6030602@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Em 07-11-2012 22:58, Tom Lane escreveu:
> Rodrigo Rosenfeld Rosas<rr(dot)rosas(at)gmail(dot)com> writes:
>> Ok, I could finally strip part of my database schema that will allow you
>> to run the explain query and reproduce the issue.
>> There is a simple SQL dump in plain format that you can restore both on
>> 9.1 and 9.2 and an example EXPLAIN query so that you can see the
>> difference between both versions.
>> Please keep me up to date with regards to any progress. Let me know if
>> the commit above fixed this issue.
> AFAICT, HEAD and 9.2 branch tip plan this query a bit faster than 9.1
> does.

Great! What is the estimate for 9.2.2 release?

> It does appear that the problem is the same one fixed in that
> recent commit: the problem is you've got N join clauses all involving
> t.id and so there are lots of redundant ways to use the index on t.id.

And what is the reason why fast.sql performs much better than slow.sql?
Is it possible to optimize the planner so that both fast.sql and
slow.sql finish about the same time?

> I've got to say though that this is one of the most bizarre database
> schemas I've ever seen.

Merlin seems to share your opinion on that. I'd love to try a different
database design when I have a chance.

What would you guys suggest me for handling my application requirements?

The only reason it is bizarre is because I have no idea on how to
simplify much our database design using relational databases. And pstore
also doesn't sound like a reasonable option either for our requirements.

The only other option I can think of is stop splitting
transaction_condition in many tables (one for each data type). Then I'd
need to include all possible columns in transaction_condition and I'm
not sure if it would perform better and what would be the implications
with regards to the database size since most columns will be null for
each record. This also introduces another issue. I would need to create
a trigger to detect if the record is valid upon insertion to avoid
creating records with all columns set to NULL for instance. Currently
each separate table that store the values have not-null constraints
among others to prevent this kind of problem. Triggers are more
complicated to maintain, specially because we're used to using an ORM
(except for this particular case where I generate the SQL query manually
instead of using an ORM for this).

Also, we migrate the database using standalone_migrations:

https://github.com/thuss/standalone-migrations

If we change a single line in the trigger code it won't be easy to see
what line has changed in the commit that introduces the change because
we would have to create a separate migration to alter the trigger with
all code repeated.

> It seems to be sort of an unholy combination of
> EAV and a star schema. A star schema might not actually be a bad model
> for what you're trying to do, but what you want for that is one big fact
> table and a collection of *small* detail tables you join to it (small
> meaning just one entry per possible value). The way this is set up, you
> need to join two or three tables before you can even join to the main
> fact table - and those tables don't even have the virtue of being small.
> That's never going to perform well.

If I understand correctly, you're suggesting that I dropped
transaction_condition(id, transaction_id, type_id) and replaced
condition_boolean_value(id, condition_id, value) with
condition_boolean_value(id, transaction_id, type_id, value) and repeat
the same idea for the other tables.

Is that right? Would that perform much better? If you think so, I could
try this approach when I find some time. But I'd also need to
denormalize other related tables I didn't send in the schema dump. For
instance, the documents snippets have also a condition_id column. Each
field value (transaction_condition) can have multiple contract snippets
in a table called condition_document_excerpt(id, document_id,
condition_id, "position"). I'd need to remove condition_id from it and
append transaction_id and type_id just like the values tables. No big
deal if this would speed up our queries.

Am I missing something?

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2012-11-08 15:38:45 Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Previous Message David Boreham 2012-11-08 14:18:47 Re: HT on or off for E5-26xx ?