Re: PGSQL 9.3 - Materialized View - multithreading

From: "Graeme B(dot) Bell" <grb(at)skogoglandskap(dot)no>
To: Nicolas Paris <niparisco(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: PGSQL 9.3 - Materialized View - multithreading
Date: 2014-04-07 14:05:11
Message-ID: 821FA4DC-E084-4A5F-A8D0-50722D5B11E3@skogoglandskap.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


- http://wiki.postgresql.org/wiki/Performance_Optimization
- run it on the most powerful machine you can find
- get some more memory
- get a big (512-1TB) SSD drive
- avoid recalculating the same things over and over. if your views have many similar elements, then calculate those first into a partial result, then build the final views from the partial result.
- make sure your source tables are fully indexed and have good statistics
- run all the views once with \timing and keep track of how long they took. Fix the slow ones.

G

On 07 Apr 2014, at 15:56, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:

> Excellent.
>
> Maybe the last sub-question :
>
> Those 3600 mat views do have indexes.
> I guess I will get better performances in dropping indexes first, then refresh, then re-creating indexes.
>
> Are there other way to improve performances (like mat views storage parameters), because this routines will be at night, and need to be finished quickly.
>
> Thanks
>
> Nicolas PARIS
>
>
> 2014-04-07 14:59 GMT+02:00 Graeme B. Bell <grb(at)skogoglandskap(dot)no>:
>
> Hi again Nick.
>
> Glad it helped.
>
> Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can re-use the data from cache.
>
> Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and fast.
> It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn't be happening here, judging by your description.
>
> If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view refresh.
>
> Graeme.
>
> On 07 Apr 2014, at 14:49, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
>
> > Hello,
> > Thanks for this clear explanation !
> >
> > Then I have a sub-question :
> > Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with some differences)
> > Is it faster to :
> > 1) parallel refresh 600 time A, then 600 time B etc,
> > OR
> > 2) parallel refresh 600 time A,B,C,D,E,F
> >
> > I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same table implies concurency
> > and bad performance ?
> >
> > Thanks
> >
> > Nicolas PARIS
> >
> >
> > 2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb(at)skogoglandskap(dot)no>:
> > On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
> >
> > > Hello,
> > >
> > > My question is about multiprocess and materialized View.
> > > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
> > > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way
> > > (anderstand 8 cpu cores -> 8 refresh process in the same time)
> >
> > Hi Nick,
> >
> > out of DB solution:
> >
> > 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already.
> >
> > 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times.
> >
> > (In BASH):
> > for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands
> >
> > 3. Install Gnu Parallel and type:
> >
> > parallel < 3600commands
> >
> > 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j.
> > It will also give you a live progress report if you use --progress.
> > e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null
> >
> > parallel -j 8 --progress < 3600commands > /dev/null
> >
> > 5. If you want to make debugging easier use the parameter --tag to tag output for each command.
> >
> > Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-)
> >
> > Hope this helps & have a nice day,
> >
> > Graeme.
> >
> >
> >
> >
> >
> >
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ryan Johnson 2014-04-07 14:11:37 Re: SSI slows down over time
Previous Message Nicolas Paris 2014-04-07 13:56:43 Re: PGSQL 9.3 - Materialized View - multithreading