Re: Very big insert/join performance problem (bacula)

From: Marc Cousin <cousinmarc(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Richard Huxton <dev(at)archonet(dot)com>, Marc Cousin <mcousin(at)sigma(dot)fr>
Subject: Re: Very big insert/join performance problem (bacula)
Date: 2009-07-14 09:22:16
Message-ID: 200907141122.16886.cousinmarc@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Le Tuesday 14 July 2009 10:23:25, Richard Huxton a écrit :
> Marc Cousin wrote:
> > Temporarily I moved the problem at a bit higher sizes of batch by
> > changing random_page_cost to 0.02 and seq_page_cost to 0.01, but I feel
> > like an apprentice sorcerer with this, as I told postgreSQL that fetching
> > rows from disk are much cheaper than they are. These values are, I think,
> > completely abnormal.
>
> They certainly don't have anything to do with reality. Try putting them
> back to (say) seq_page_cost=1 and random_page_cost=2.

That's the first thing I tried (it seemed more sensible), and it didn't work. I
can't put them back to these values for more than one test query, the server
really died before I changed the settings.

>
> > So, finally, to my questions :
> > - Is it normal that PostgreSQL is this off base on these queries (sorry I
> > don't have the plans, if they are required I'll do my best to get some,
> > but they really are the two obvious plans for this kind of query). What
> > could make it choose the hash join for too small batch tables ?
>
> No point in speculating without plans.

Ok, I'll try to have them tomorrow.

>
> > - Is changing the 2 costs the way to go ?
>
> Not the way you have.
That's what I thought, and the reason I posted :)

>
> > - Is there a way to tell postgreSQL that it's more costly to sort than it
> > thinks ? (instead of telling it that fetching data from disk doesn't cost
> > anything).
>
> That's what the configuration settings do. But if you put a couple way
> off from reality it'll be pure chance if it gets any estimates right.
>
> > Here are the other non-default values from my configuration :
> >
> > shared_buffers = 2GB
> > work_mem = 64MB
>
> Set this *much* higher when you are running your bulk imports. You can
> do it per-connection. Try 256MB, 512MB, 1GB (but keep an eye on total
> memory used).

I'll try that. But anyhow, I've got much better performance when not doing the
hash join. I'll get back with the plans as soon as possible.

>
> > maintenance_work_mem = 256MB
> > max_fsm_pages = 15000000 # There are quite big deletes with bacula ...
> > effective_cache_size = 800MB
>
> See other emails on this one.
>
> > default_statistics_target = 1000
>
> Probably don't need this for all columns, but it won't cause problems
> with these queries.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Albe Laurenz 2009-07-14 09:39:59 Re: embedded sql regression from 8.2.4 to 8.3.7
Previous Message Marc Cousin 2009-07-14 09:16:01 Re: Very big insert/join performance problem (bacula)