Re: Thoughts on "Love Your Database"

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Szymon Lipiński <mabewlun(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Thoughts on "Love Your Database"
Date: 2016-05-21 12:05:49
Message-ID: CAKt_ZftYDp+FHyKn-QTcgHD6M2KkPQjvUKFyo_LmHRXxKBSb8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, May 21, 2016 at 12:49 PM, Szymon Lipiński <mabewlun(at)gmail(dot)com>
wrote:

>
>
> Strangely I have never had a problem testing stored procedures. You have
>> to create a data set for the tests of course and that is the hardest part,
>> but there are some really nice things:
>>
>> 1. If your test scripts always roll back you can run them on a
>> production database as a troubleshooting step
>> 2. It is easy to hook things up to a TAP harness (whether using PgTAP or
>> some hand-rolled solution). I think it would be harder to connect to xunit
>> though. So use TAP ;-)
>> 3. I usually create a test results table (in my test case, rolled back
>> after!) which stores the test description and pass status. That makes it
>> easy to check using other tools.
>>
>> Usually I set aside a range of things (negative id's for example) for
>> testing purposes.
>>
>>
> I had problems, and I'm really interested in making it work for me. I have
> a couple of questions:
> How do you manage versioning of the stored procedures? Especially do you
> have any problems upgrades?
>

We reload all stored procedures on every upgrade. I have seen a lot of
projects that do this when the stored procedures are heavy. Versioning
then happens via another program (git mercurial, svn...)

transactional ddl makes this pretty seemless.

> What about testing logic which is outside the database? Do you use pgtap
> for testing the schema only, or to test some of the external logic as well?
>

I actually use a TAP harness with a home-brewed interface for attaching to
the stored procedures. I threw it together in a couple of hours (the
translation layer is in Perl). It isn't perfect but it works generally.
We could use pgTAP and avoid the translation layer but didn't want to add
it as a build/test dependency there. We are considering using pgTAP for
schema tests that are not intended to be run on other systems.

> Do you use logic inside and outside the database at the same time?
>

Most of the projects I have worked on that have done stored proc testing
have been Perl projects and the typical approach there is to use a TAP
harness for testing. The stored procedures are in their own test script.
We have other tests (including integration tests) in other test scripts.
For LedgerSMB our db tests can be run against a live production db as can
some of our integration tests, but others (those that rely on actual http
queries) cannot be.

But these are all integrated in the test framework so when you do:

make test

your environment variables determine what is tested (and it may include
database logic).

I don't know how it would work with xunit but the same basic approach we
use could be used elssewhere.

basically, our test adaptor basically runs scripts, each of which:

1. Loads test adata and creates a schema for storing test results
2. Run tests, storing the results in the test_results table
3. Output a report of test results
4. Rolls back

That output is then parsed, and converted into test output for our test
framework.

> How does this scale to a couple of servers when the load is so huge you
> need to have e.g. ten physical web servers at front?
>

If you are asking about scalability testing and load testing, you aren't
going to use the same approaches you do for unit tests. When I have had to
do these, I have typically had to write tools to run, for example, a stored
procedure with different arguments many times in parallel. Obviously you
don't do this on a production system under load.

For other environments I have never seen one that didn't have staging and
testing environments and that;s where you would look initially at some of
these.

Now this being said, load affects database query performance in ways that
aren't always as one would expect (for example, what is in cache can affect
things quite a lot).

> It seems for me that it would be easier to spread the cpu logic overhead
> to plenty of servers instead of having just one machine which needs to do
> all the things. But maybe I'm wrong.
>

But that is a different question. If you are dealing with large databases
under load (say, for example, an 11TB database which serves a web front-end
plus an HPC cluser of many nodes, totalling, say, a commitment to handle up
to say 500 cpu cores as client apps running 24/7), performance in your
database is something which you will have to pay very close attention to.
Some things you will want to handle outside the database, to be sure.
Some other things you will want to make sure the logic is well contained in
the database.

You can't always say pulling the data to the client and processing it there
always improves scalability. Concerns such as effective use of the buffer
cache can make it very important to get the logic in the right place.
Additionally some things in PostgreSQL, like deTOASTing data can add a lot
of overhead that you might be able to avoid with the right functions. If
you need solid scalability you need people who understand performance costs
of various options and can weigh them in a particular context.

>
>
> --
> regards Szymon Lipiński
>

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-05-21 14:01:22 Re: How to know if SPI or some other API triggered an ERROR.
Previous Message Szymon Lipiński 2016-05-21 10:49:02 Re: Thoughts on "Love Your Database"