From: | Alban Hertroys <haramrae(at)gmail(dot)com> |
---|---|
To: | Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au> |
Cc: | pgsql-general list <pgsql-general(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com> |
Subject: | Re: Many, many materialised views - Performance? |
Date: | 2013-10-09 10:05:33 |
Message-ID: | DEBCDB07-CC8F-4758-A501-34543324C578@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Oct 9, 2013, at 4:08, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au> wrote:
>
>> In this instance, we have a lot of queries that build certain aggregate
>> results, which are very slow. The queries were initially all implemented
>> as views, but then we started doing a type of materialising of our own,
>> turning them into tables with CREATE TABLE AS SELECT ....
>> This does make the results very fast to access now, but the side effect
>> is a vast number of (very small) tables.
>
> If you have multiple tables with identical layout but different
> subsets of the data, you will probably get better performance by
> putting them into a single table with indexes which allow you to
> quickly search the smaller sets within the table.
I was thinking just that while reading Toby's message. For example, you could put the results of several related aggregations into a single materialized view, if they share the same key columns (year, month, factory or something similar).
I'm not sure the new built-in materialized views can be updated like that though, unless you manage to combine those aggregations into a single monster-query, but that will probably not perform well...
What we tend to do at work (no PostgreSQL, unfortunately) is to use external tools to combine those aggregated results and store that back into the database (which we often need to do anyway, as we deal with several databases on several servers).
Additionally, if you have that many tables, it sounds like you partitioned your data.
With aggregated results, the need for partitioning is much less (or perhaps it isn't even needed at all). And perhaps you don't even need the data from all partitions; say if you have monthly partitions of data, do you really need aggregated results from 5 years ago?
That said, users excel in finding data to request that you thought they wouldn't need.
Which brings me to another question: Do your users really need the data from all those views or do they only think they need that?
Frequently, users create elaborate Excel sheets and then request tons of data to fill them, while what they're really interested in is the _result_ of that Excel sheet. If you can provide them with that, they're happy and you can rest assured that they're at least using correct results. Plus, it removes some of _this_ burden from your database.
I've seen users who're busy creating sheets like that for 2 weeks, twice a year, to create data that I can prepare for them in a couple of days into a report that takes 2 minutes to load (which is long, but not compared to their 2 weeks).
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2013-10-09 10:10:01 | Re: Incorrect index being used |
Previous Message | raghu ram | 2013-10-09 09:39:31 | Re: Hi, Friends, are there any ETL tools (free or commercial) available for PostgreSQL? |