Re: PostgreSQL and VIEWS

From: Misa Simic <misa(dot)simic(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL and VIEWS
Date: 2013-03-25 21:32:50
Message-ID: CAH3i69kG1ihFqUQ+Vpb954ibGduG63cR4W-ZEpSrgawqf9b1eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Merlin,

Well... sorry, It could be and my bad english... but let me explain
chronologicaly things...

I have first written concrete case...

http://postgresql.1045698.n5.nabble.com/PostgreSQL-planner-tp5749427.html

But because of I recognized the pattern - always is problem with JOIN to a
view...

I have written this abroad generic question.... Because of, I think,
Postgres have problem with JOIN to a view in general...So probably someone
before me have had the same problem - and if that is the case I just wanted
to hear thier solution...

But from others examples, and some tests EXPLAIN ANALYZE I have done...

i.e. SELECT t1.a FROM t1 LEFTJOIN t2 USING (a)

Planer includes some actions related to t2 - what are not necessary at
all... again - it is just my opinion :)
(Please, don't take this - I don't know... as some most important thing...)

So that are "small" problems - on our simplified examples - what have big
impact in performance on a bit complex examples...

So what we have indentified until know - solution to our problem with views
- is always: "rephrase the question" (not indexes - they exist - just not
used...)

for example:

SELECT view.* FROM view INNER JOIN t1 USING (col1) WHERE t1.col2 = 1

to get better performance, you need to say:

SELECT view.* FROM view WHERE col1 = (SELECT t.col1 FROM t1 WHERE t1.col2 =
1)

Logically - that are the same questions - result is the same.... just from
some unknown reason to me - postgres in first case picks wrong plan - and
we got very bad performance... :(

So solution to our problem - is to add "rephrase the question" tier...
(analyze what "input question" is - and transform it to better SQL for
Postgres"

And fortunately we have that flexibility in our app... And how the things
are - we will need to do it,,,.

So, if input question is: SELECT t1.a FROM t1 LEFT JOIN t2 USING (a) -
transform it to: SELECT t1.a FROM t1

etc...

But, don't you think that would be better for Postgres planner in general?

Nowhere in our examples are cases like where A || B = 'x' ... or WHERE
volatile_function(a, b) = 5... etc...

Materalisation - well that is another thing why we use VIEWS....

So for calculated properties of the things - we use SQL stable functions...

i.e the thing: Customer - is the VIEW in postgres:

cust_id, cust_name, blablabla...columns, customer_balance

customer_balance - is calculated property of Entity: Customer - postgres
function actually...

if custmer_balabce sucks in perform - involve mat view - and the function
will return value from mat_view - instead of to do calculation... But
again - we dont have the problem with that :)

We have the problem with:

SELECT c.* FROM customers_view c INNER JOIN invoices USING (customer_id)
WHERE invoice_id = 156

And solution to our problem is: "rephrase the question" :)

Kind Regards,

Misa

2013/3/25 Merlin Moncure <mmoncure(at)gmail(dot)com>

> On Sat, Mar 23, 2013 at 9:25 PM, Misa Simic <misa(dot)simic(at)gmail(dot)com> wrote:
> > HI,
> >
> > When I have met PostgreSQL for a first time - I have been really amazed -
> > with many things...
> >
> > But how we started to use it - and data jumps in - we meet performance
> > problems...
> >
> > Now, it is a bit tricky... any concrete performance problem - can be
> solved
> > on some way...
> >
> > However, I am more concerned, that "our way" how we do the things - is
> not
> > best way for Postgres...
> >
> > The thing is - our tables are highly normalised - so lot of joins...
> (NULLS
> > do not exist in the tables)
> >
> > to make things simplier we use VIEWS to "denormalise" data - though it is
> > not that much we care about denormalisation - it is more to make things
> > simplier and less error prone...
> >
> > So every "thing" (Customer, product, transaction, employee, whatever) is
> > built up from more tables... I am not sure we have even one "thing"
> built up
> > just from 1 table...
> >
> > Lot of "thing properties" are actually calculations: i.e. Invoice
> Amount, or
> > for Employee:
> > We have in the one table: first_name and last_name fields, but Full name
> is
> > concatented as:
> >
> > last_name || ',' || first_name
> >
> > So, whenever we need employee full name - is it on Employee Info window
> - or
> > in an Invoice as Salesperson, or in Payslip report...instead of to
> > everywhere have above formula - we have function...
> > but again, instead of to developer think about each possible function for
> > concrete thing - we have made for each entity - the view... what have all
> > relevant joins - and all relevant calculated properties... about the
> > thing...
> >
> > I have had a thought - somewhere read (but now I am more convenient I
> have
> > read it wrong) that planner is that smart that if we we have:
> >
> > CREATE VIEW person_view AS
> > SELECT person_id, first_name, last_name, fn_full_name(id) as full_name,
> > date_of_birth, age(date_of_birth) from person LEFT JOIN
> > person_date_of_birth USING (person_id)
> >
> > SELECT first_name FROM person_view WHERE id = 1
> >
> > Planner is smart and:
> > -will not care about joins - you ask for the field(s) - what do not
> belong
> > to other tables - both belong to 1 - and it is pk!
> > -will not care about functions - you havent asked for any field what is
> > function in your query...
> >
> > However - how we met more and more problems with our approach... and
> > spending more time on
> > EXPLAIN ANALYZE - instead of on business problems... It seems things are
> not
> > that good...
> >
> > for simple questions as above - results are acceptable - even looking
> into
> > EXPLAIN ANALYZE i would not say it is the best possible plan... (i.e.
> > planner spending time on Seq Scan on person_date_of_birth_table - and
> filter
> > it - even no need to think about that table at all - LEFT JOIN (show me
> > columns - if there is a matching row for pk column) - so could be check
> via
> > index -however - there is no any column from that table in the query - I
> > would simple discard that table from plan....
> >
> > So query
> >
> > SELECT id FROM view WHERE id = 5 (view is SELECT * FROM table1 LEFT JOIN
> > table2)
> >
> > I would treat the same as:
> > SELECT id FROM table1 = 5
> >
> > ok in INNER JOIN it requires additional confimration - but even there is
> FK
> > to PK join - that confirmation is not needed iether - but in our cases
> it is
> > always FK to PK...
> >
> > However - if we need to involve more "entities"/views - from some unknown
> > reason to me - postgres always picks bad plan...
> >
> > i.e. list of employees what work in New York
> >
> > we have employees_contract table:
> > contract_id, person_id, department_id,
> >
> > a lot of others tables, but to make it shorter:
> >
> >
> > Department_view
> >
> > Buidlings_view
> >
> >
> > and now query:
> > SELECT full_name FROM person_view INNER JOIN emplyee_contract USING
> > (person_id) INNER JOIN department_view USING (department_id) INNER JOIN
> > buildings_view USING (building_id) WHERE city_id = 'NY'
> >
> >
> > from some unknown reason - gives bad plan - then if we "refactor" query
> and
> > send different question - we get good result... I am pretty sure planner
> > should be capable to "rephrase" my question instead of me...
> >
> > I would like to hear your expirience with VIEWS in postgres... And some
> kind
> > of best practice/advice for described situation... So far it looks to me
> > there is no way - to make things ready for any specific question - every
> > "request" will need specific SQL syntax to drive planner in acceptable
> > direction...
>
>
> You asked some broad questions so you are going to get broad answers.
>
> *) query planner is very complicated and changes are very incremental.
> only a very, very small number of people (Tom especially) are capable
> of making major changes to it. some known planner issues that might
> get fixed in the short term are better handling of quals through UNION
> ALL and/or pushing quals through partitioned windows functions. these
> are documented shortcomings -- other improvements have to be looked
> through the lens of 'what else did you break', including,
> unfortunately, plan time.
>
> *) filtering in predicate expressions (select * from foo where a || b
> = 'x') is going to always suck. consider optimizing via indexes
> and/or materialization
>
> *) excessive use of views is for data abstraction (a technique I
> advocate) is on one side of a tradeoff between materlization and run
> time performance. if you are exceeding your ability to quickly write
> clean queries, perhaps it's time to start thinking about some
> materialization steps. especially you should be asking yourself if
> users are needing 100% up to date data (if not, materialization
> requirements relax significantly)
>
> *) better, more experienced programmers write faster, cleaner code.
> this is true in all backends. @ two years in, you have some learning
> to do that -hackers will not be able to solve for you.
>
> merlin
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2013-03-25 23:13:00 Re: PostgreSQL and VIEWS
Previous Message Lonni J Friedman 2013-03-25 20:25:48 Re: replication behind high lag