Re: PGSQL 9.3 - Materialized View - multithreading

From: "Graeme B(dot) Bell" <grb(at)skogoglandskap(dot)no>
To: Nicolas Paris <niparisco(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: PGSQL 9.3 - Materialized View - multithreading
Date: 2014-04-07 12:59:25
Message-ID: D8DF4544-81A4-4465-8AD4-25DA3CC20596@skogoglandskap.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Hi again Nick.

Glad it helped.

Generally, I would expect that doing all the A's first, then all the B's, and so on, would be fastest since you can re-use the data from cache.

Concurrency when reading isn't generally a problem. Lots of things can read at the same time and it will be nice and fast.
It's concurrent writes or concurrent read/write of the same data item that causes problems with locking. That shouldn't be happening here, judging by your description.

If possible, try to make sure nothing is modifying those source tables A/B/C/D/E/F when you are doing your view refresh.

Graeme.

On 07 Apr 2014, at 14:49, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:

> Hello,
> Thanks for this clear explanation !
>
> Then I have a sub-question :
> Supposed I have 3600 materialised views say 600 mat views from 6 main table. (A,B,C,D,E,F are repetead 600 times with some differences)
> Is it faster to :
> 1) parallel refresh 600 time A, then 600 time B etc,
> OR
> 2) parallel refresh 600 time A,B,C,D,E,F
>
> I guess 1) is faster because they are 600 access to same table loaded in memory ? But do parallel access to the same table implies concurency
> and bad performance ?
>
> Thanks
>
> Nicolas PARIS
>
>
> 2014-04-07 12:29 GMT+02:00 Graeme B. Bell <grb(at)skogoglandskap(dot)no>:
> On 04 Apr 2014, at 18:29, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
>
> > Hello,
> >
> > My question is about multiprocess and materialized View.
> > http://www.postgresql.org/docs/9.3/static/sql-creatematerializedview.html
> > I (will) have something like 3600 materialised views, and I would like to know the way to refresh them in a multithread way
> > (anderstand 8 cpu cores -> 8 refresh process in the same time)
>
> Hi Nick,
>
> out of DB solution:
>
> 1. Produce a text file which contains the 3600 refresh commands you want to run in parallel. You can do that with select and format() if you don't have a list already.
>
> 2. I'm going to simulate your 3600 'refresh' commands here with some select and sleep statements that finish at unknown times.
>
> (In BASH):
> for i in {1..3600} ; do echo "echo \"select pg_sleep(1+random()::int*10); select $i\" | psql mydb" ; done > 3600commands
>
> 3. Install Gnu Parallel and type:
>
> parallel < 3600commands
>
> 4. Parallel will automatically work out the appropriate number of cores/threads for your CPUs, or you can control it manually with -j.
> It will also give you a live progress report if you use --progress.
> e.g. this command balances 8 jobs at a time, prints a dynamic progress report and dumps stdout to /dev/null
>
> parallel -j 8 --progress < 3600commands > /dev/null
>
> 5. If you want to make debugging easier use the parameter --tag to tag output for each command.
>
> Of course it would be much more elegant if someone implemented something like Gnu Parallel inside postgres or psql ... :-)
>
> Hope this helps & have a nice day,
>
> Graeme.
>
>
>
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tatsuo Ishii 2014-04-07 13:46:10 Re: performance degradation after launching postgres cluster using pgpool-II
Previous Message Heikki Linnakangas 2014-04-07 12:53:33 Re: Batch update query performance