From: | Aren Cambre <aren(at)arencambre(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Postgres refusing to use >1 core |
Date: | 2011-05-23 04:09:49 |
Message-ID: | BANLkTikz3mYua4e1kmTfKkfOxay06oBJjg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Also, thanks for the advice on batching my queries. I am now using a very
efficient bulk data read and write methods for Postgres.
My program bulk reads 100,000 rows, processes those rows (during which it
does a few SELECTs), and then writes 100,000 rows at a time.
It cycles through this until it has processed all 12,000,000 rows.
This, plus the parallelism fix, will probably convert this 30 hour program
to a <2 hour program.
Aren
On Sun, May 22, 2011 at 9:08 AM, Aren Cambre <aren(at)arencambre(dot)com> wrote:
> Just wanted to again say thanks for everyone's help.
>
> The main problem was that my program was running in serial, not parallel,
> even though I thought I used a textbook example of PLINQ. Your assistance
> helped me get to the point where I could conclusively determine everything
> was running in serial. It was more obvious than I realized.
>
> Thanks to help through
> http://stackoverflow.com/questions/6086111/plinq-on-concurrentqueue-isnt-multithreading,
> I have switched to the .NET Framework's Task Parallel Library, and it's
> slamming the 8 cores hard now! And there's a bunch of concurrent connections
> to Postgres. :-)
>
> Aren
>
> On Mon, May 9, 2011 at 4:23 PM, Aren Cambre <aren(at)arencambre(dot)com> wrote:
>
>> I have a multi-threaded app. It uses ~22 threads to query Postgres.
>>
>> Postgres won't use more than 1 CPU core. The 22-threaded app only has 3%
>> CPU utilization because it's mostly waiting on Postgres.
>>
>> Here's the details:
>>
>> The app has a "main" thread that reads table A's 11,000,000 rows, one at a
>> time. The main thread spawns a new thread for each row in table A's data.
>> This new thread:
>>
>> 1. Opens a connection to the DB.
>> 2. Does some calculations on the data, including 1 to 102 SELECTs on
>> table B.
>> 3. With an INSERT query, writes a new row to table C.
>> 4. Closes the connection.
>> 5. Thread dies. Its data is garbage collected eventually.
>>
>> Physical/software details:
>>
>> - Core i7 processor--4 physical cores, but OS sees 8 cores
>> via hyper-threading
>> - 7200 RPM 500 GB HDD
>> - About 1/2 total RAM is free during app execution
>> - Windows 7 x64
>> - Postgres 9.0.4 32-bit (32-bit required for PostGIS)
>> - App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres
>> connection tool.
>>
>> At first, the app pounds all 8 cores. But it quickly tapers off, and only
>> 1 core that's busy. The other 7 cores are barely doing a thing.
>>
>> Postgres has 9 open processes. 1 process was slamming that 1 busy core.
>> The other 8 Postgres processes were alive but idle.
>>
>> Each thread creates its own connection. It's not concurrently shared with
>> the main thread or any other threads. I haven't disabled connection pooling;
>> when a thread closes a connection, it's technically releasing it into a pool
>> for later threads to use.
>>
>> Disk utilization is low. The HDD light is off much more than it is on, and
>> a review of total HDD activity put it between 0% and 10% of total capacity.
>> The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.
>>
>> The app runs 2 different queries on table B. The 1st query is run once,
>> the 2nd query can be run up to 101 times. Table C has redundant indexes:
>> every column referenced in the SQL WHERE clauses for both queries are
>> indexed separately and jointly. E.g., if query X references columns Y and Z,
>> there are 3 indexes:
>>
>> 1. An index for Y
>> 2. An index for Z
>> 3. An index for Y and Z
>>
>> Table C is simple. It has four columns: two integers, a varchar(18), and a
>> boolean. It has no indexes. A primary key on the varchar(18) column is its
>> only constraint.
>>
>> A generalized version of my INSERT command for table C is:
>> *INSERT INTO raw.C VALUES (:L, :M, :N, :P)*
>>
>> I am using parameters to fill in the 4 values.
>>
>> I have verified table C manually, and correct data is being stored in it.
>>
>> Several Google searches suggest Postgres should use multiple cores
>> automatically. I've consulted with Npgsql's developer, and he didn't see how
>> Npgsql itself could force Postgres to one core. (See
>> http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)
>>
>> What can I do to improve this? Could I be inadvertently limiting Postgres
>> to one core?
>>
>> Aren Cambre
>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2011-05-23 05:22:51 | Re: Postgres refusing to use >1 core |
Previous Message | Tomas Vondra | 2011-05-22 23:11:40 | Re: SORT performance - slow? |