From: | Dan Langille <dan(at)langille(dot)org> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Running lots of inserts from selects on 9.4.5 |
Date: | 2016-02-10 10:13:12 |
Message-ID: | 43C0F377-7A27-455A-B695-D3EA92B81128@langille.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> On Feb 10, 2016, at 2:47 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> On Tue, Feb 9, 2016 at 4:09 PM, Dan Langille <dan(at)langille(dot)org> wrote:
>> I have a wee database server which regularly tries to insert 1.5 million or
>> even 15 million new rows into a 400 million row table. Sometimes these
>> inserts take hours.
>>
>> The actual query to produces the join is fast. It's the insert which is
>> slow.
>>
>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5,
>> DeltaSeq)
>> SELECT batch_testing.FileIndex, batch_testing.JobId, Path.PathId,
>> Filename.FilenameId, batch_testing.LStat, batch_testing.MD5,
>> batch_testing.DeltaSeq
>> FROM batch_testing JOIN Path ON (batch_testing.Path = Path.Path)
>> JOIN Filename ON (batch_testing.Name =
>> Filename.Name);
>>
>> This is part of the plan: http://img.ly/images/9374145/full created via
>> http://tatiyants.com/pev/#/plans
>>
>> This gist contains postgresql.conf, zfs settings, slog, disk partitions.
>>
>> https://gist.github.com/dlangille/33331a8c8cc62fa13b9f
>
> The table you are inserting into has 7 indexes, all of which have to
> be maintained. The index on the sequence column should be efficient
> to maintain. But for the rest, if the inserted rows are not naturally
> ordered by any of the indexed columns then it would end up reading 6
> random scattered leaf pages in order to insert row pointers. If none
> those pages are in memory, that is going to be slow to read off from
> hdd in single-file. Also, you are going dirty all of those scattered
> pages, and they will be slow to write back to hdd because there
> probably won't be much opportunity for write-combining.
>
> Do you really need all of those indexes?
>
> Won't the index on (jobid, pathid, filenameid) service any query that
> (jobid) does, so you can get rid of the latter?
>
> And unless you have range queries on fileindex, like "where jobid = 12
> and fileindex between 4 and 24" then you should be able to replace
> (jobid, fileindex) with (fileindex,jobid) and then get rid of the
> stand-alone index on (fileindex).
>
> If you add an "order by" to the select statement which order by the
> fields of one of the remaining indexes, than you could make the
> maintenance of that index become much cheaper.
I will make these changes one-by-one and test each. This will be interesting.
> Could you move the indexes for this table to SSD?
Now that's a clever idea.
bacula=# select pg_size_pretty(pg_indexes_size('file'));
pg_size_pretty
----------------
100 GB
(1 row)
bacula=# select pg_size_pretty(pg_table_size('file'));
pg_size_pretty
----------------
63 GB
(1 row)
bacula=#
No suprising that the indexes are larger than the data.
The SSD is 30GB. I don't have enough space. Buying 2x500GB SSDs
would allow me to put all the data onto SSD. I'm using about 306G for the
databases now.
> SSD is probably wasted on your WAL. If your main concern is bulk
> insertions, then WAL is going to written sequentially with few fsyncs.
> That is ideal for HDD. Even if you also have smaller transactions,
OK.
> WAL is still sequentially written as long as you have a non-volatile
> cache on your RAID controller which can absorb fsyncs efficiently.
Of note, no RAID controller or non-volatile cache here. I'm running ZFS with plain HBA controllers.
Thank you. I have some interesting changes to test.
--
Dan Langille - BSDCan / PGCon
dan(at)langille(dot)org
From | Date | Subject | |
---|---|---|---|
Next Message | Moreno Andreo | 2016-02-11 18:06:42 | Architectural question |
Previous Message | Jeff Janes | 2016-02-10 07:47:02 | Re: Running lots of inserts from selects on 9.4.5 |