Quick Links

Re: Multithreaded query onto 4 postgresql instances

From:	Alessandro Candini <candini(at)meeo(dot)it>
To:	Allan Kamau <kamauallan(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Multithreaded query onto 4 postgresql instances
Date:	2011-02-15 08:35:24
Message-ID:	4D5A3ACC.601@meeo.it
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Il 14/02/2011 21:00, Allan Kamau ha scritto:
> On Mon, Feb 14, 2011 at 10:38 AM, Alessandro Candini<candini(at)meeo(dot)it> wrote:
>> No, this database is on a single machine, but a very powerful one.
>> Processors with 16 cores each and ssd disks.
>>
>> I already use partitioning and tablespaces for every instance of my db and I
>> gain a lot with my splitted configuration.
>> My db is pretty huge: 600 milions of records and partitioning is not
>> enough...
>> I performed tests with a query returning more or less 100000 records and
>> using my C module I obtain the following results (every test performed
>> cleaning cache before):
>> - single db: 9.555 sec
>> - splitted in 4: 5.496 sec
>>
>> So I think this can be a good approach...
>> I have already read this
>>
>> http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>>
>> I posted it in my previous thread. But it is not clear to me how to embed my
>> C function into postgresql.
>> I mean, I know how to compile and insert it into postgresql, I've already
>> done it for simple function which return a boolean, for example.
>> But it is not clear to me how to do this if I want as return a complete
>> table, or set of rows...
>> Can you give me a minimalistic example?
>>
>> By the way, my goal is to perform a SELECT query in the smallest time
>> possible.
>> For that reason an SQL function is not good beacuse slower than a C
>> function...
>>
>> Thank you!
>>
>>> On 10 Feb 2011, at 9:01, Alessandro Candini wrote:
>>>
>>>> I have installed 4 different instances of postgresql-9.0.2 on the same
>>>> machine, on ports 5433, 5434, 5435, 5436.
>>> I do hope you intend to put those databases on different machines
>>> eventually, or some such? Otherwise you probably didn't gain anything by
>>> splitting your database up like that - you've just reduced the available
>>> resources on that single machine.
>>>
>>>> Why I have to do something tricky like this is long too explain...
>>> It would help to know what you're trying to achieve by splitting your
>>> database up like this. We don't need the full story, just a summary is fine;
>>> Maybe this is some experimental setup that's more related to multi-threading
>>> than to the actual database design? Maybe management smoked something
>>> outlandish and put you up with this? Maybe this is a macroscopic attempt to
>>> table partitioning?
>>>
>>>> Then I have developed a C function using libpq which creates 4 threads,
>>>> each one which query a 1/4 of the db. After that I merge the results in one
>>>> single response.
>>>>
>>>> My function works fine, but I need to include it inside a postgresql
>>>> instance in order to launch it as a normal SQL query (SELECT myfunc(...);).
>>>> I have read the documentation here
>>>> http://www.postgresql.org/docs/9.0/static/xfunc-sql.html#XFUNC-SQL-FUNCTIONS-RETURNING-SET,
>>>> but I'm pretty confuse and I don't know if this is the right way to
>>>> accomplish my goal.
>>> What are you confused about? That's a chapter about set-returning
>>> functions written in SQL - perhaps you're looking for information about
>>> set-returning functions from an external library like yours? Perhaps you're
>>> looking for: http://www.postgresql.org/docs/9.0/interactive/xfunc-c.html
>>>
>>> You say you don't know if this is the right way to accomplish your goal -
>>> which is...?
>>>
>>>> Have you got any ideas or suggestions?
>>> As others have suggested, you should probably have a look at
>>> table-partitioning, possibly in combination with tablespaces if you want to
>>> divide your database among multiple disks/filesystems. Putting them in
>>> different servers on the same hardware is probably not going to be a very
>>> good solution. As you've already found out, it makes querying the data silly
>>> difficult. But, we don't know the reason you're doing that of course.
>>>
>>> Alban Hertroys
>>>
>>> --
>>> If you can't see the forest for the trees,
>>> cut the trees and you'll see there is no forest.
>>>
>>>
>>> !DSPAM:1234,4d567a2a11731320518513!
>>>
>>>
>>
>> --
>> Alessandro Candini
>>
>> MEEO S.r.l.
>> Via Saragat 9
>> I-44122 Ferrara, Italy
>> Tel: +39 0532 1861501
>> Fax: +39 0532 1861637
>> http://www.meeo.it
>>
>> ========================================
>> "ATTENZIONE:le informazioni contenute in questo messaggio sono
>> da considerarsi confidenziali ed il loro utilizzo č riservato unicamente
>> al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
>> per errore č tenuto ad informare il mittente ed a rimuoverlo
>> definitivamente da ogni supporto elettronico o cartaceo."
>>
>> "WARNING:This message contains confidential and/or proprietary
>> information which may be subject to privilege or immunity and which
>> is intended for use of its addressee only. Should you receive this
>> message in error, you are kindly requested to inform the sender and
>> to definitively remove it from any paper or electronic format."
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
> Could it be that your single instance PostgreSQL seems slow because a
> query can only use one process(or), and in your case you could be
> maxing out the bandwidth of a single core. And the multi-instance
> implementation is "faster" simply because of simultaneous queries
> using one core per query and hence more bandwidth in total.
>
> Now depending on the situation you are trying to solve, the simple
> test you've present may not perform very well in a real situation for
> example.
> 1)If you intend to perform single client, single user, single query
> then you may want to formulating a series of queries each different
> from another by some value (range) in its where clause then run these
> queries in parallel then merge the results somehow. I think this setup
> will still be faster than the multi-instance setup.
>
> 2)If such large queries will be run simultaneous hence the use of
> several CPU cores, the single instance will again most likely
> outperform the multi-instance setup.
>
> Try simulating your situation my writing some threaded client and
> seeing the performance.
>
>
> Allan.
Writing a threaded client?!
But it is exactly what I did!

Is there a postgresql forum? Using a mailing list is so frustrating...

--
Alessandro Candini

MEEO S.r.l.
Via Saragat 9
I-44122 Ferrara, Italy
Tel: +39 0532 1861501
Fax: +39 0532 1861637
http://www.meeo.it

========================================
"ATTENZIONE:le informazioni contenute in questo messaggio sono
da considerarsi confidenziali ed il loro utilizzo è riservato unicamente
al destinatario sopra indicato. Chi dovesse ricevere questo messaggio
per errore è tenuto ad informare il mittente ed a rimuoverlo
definitivamente da ogni supporto elettronico o cartaceo."

"WARNING:This message contains confidential and/or proprietary
information which may be subject to privilege or immunity and which
is intended for use of its addressee only. Should you receive this
message in error, you are kindly requested to inform the sender and
to definitively remove it from any paper or electronic format."

In response to

Re: Multithreaded query onto 4 postgresql instances at 2011-02-14 20:00:07 from Allan Kamau

Responses

Re: Multithreaded query onto 4 postgresql instances at 2011-02-15 09:45:02 from Christian Ullrich

Browse pgsql-general by date

	From	Date	Subject
Next Message	Christian Ullrich	2011-02-15 09:45:02	Re: Multithreaded query onto 4 postgresql instances
Previous Message	Alessandro Candini	2011-02-15 08:32:55	Re: Multithreaded query onto 4 postgresql instances