Re: postgres slower on nested queries

From: Alexander Elgert <alexander_elgert(at)adiva(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres slower on nested queries
Date: 2007-03-07 11:12:52
Message-ID: 45EE9E34.2010900@adiva.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Richard Huxton schrieb:
> Alexander Elgert wrote:
>> Hello,
>>
>> I programmed a little script which iterates over all databases in a
>> DBMS, iterating over all tables and then over all columns.
>> This skript works for mysql and postgres.
>>
>> 1. Solution overview
>> ====================
>>
>> foreach database {
>> foreach table {
>> foreach column {
>> do something ...
>> }
>> }
>> }
>
> OK - obviously not the way to do it if you really want all the columns
> in all the databases...
>
>> 1. Solution detail
>> ==================
>>
>> foreach database
>> ----------------
>> mysql: show databases
>> postgres: SELECT d.datname as Name FROM pg_catalog.pg_database d
>> WHERE (d.datname != 'template0' AND d.datname != 'template1') ORDER
>> BY 1;)
>
> You can have template databases called other names too of course.
>
>> foreach table
>> -------------
>> mysql: show tables
>> postgres: select table_name from information_schema.tables where
>> table_schema = 'public')
>>
>> foreach column
>> --------------
>> mysql: show columns from '%s'
>> postgres: select column_name from information_schema.columns where
>> table_name = '%s')
>>
>> (If there are better queries for postgres, please let me know.)
>>
>> 2. Solution
>> ===========
>>
>> I found the postgres version VERY slow, so a decided to fetch
>
> Define VERY - it took what, milliseconds to do this? Seconds? Hours?

I wrote two different functions to get the data out of the database, the
one uses a huge amount of queries, the other uses an array to hold the data.
Because in mysql I have to use "show tables/show columns from ..." I
thought I can use the same style in postgres.

Of course, it works, but the unoptimized postgres code is 80 times
slower. ;(
The overhead for a single 'show columns from ...' seems to be small in
mysql.
Please read on, I hope I am able to show you the difference.

>
>> select table_name, column_name from information_schema.columns
>> where table_schema = 'public'
>>
>> and wrote the output to an two dimensional array to process the
>> elements later:
>> $tableA[$row['table_name']][$row['column_name']]=True;
>
> Not sure what this code is meant to show.
foreach database {
foreach table_name, column_name {
$tableA[$row['table_name']][$row['column_name']]=True;
}
// process 2-dim $tableA array
}

This results in a structure where I can itereate over all keys in the
2-dim array.
You can see I iterate first over the databases and then over table AND
columns!
--- mysql: ~1s (Database X)
--- postgres: ~1s (Database Y)
;)

In contrast: =======================================================

foreach database {
foreach table {
foreach column {
do something ...
}
}
}
--- mysql: ~1s (Database X)
--- postgres: ~80s (Database Y)
;(

>
>> The second approach ist much faster, this must be because there is no
>> nesting. ;(
>
> What nesting? Are you trying to do sub-queries of some sort?
I did a loop over all tables and THEN calling a query for each table to
get the columns (from the same table).
Yes, there are definitively more queries the DBMS has to manage.
(It is a bad style, but it is intuitive. Maybe the overhead of a single
query is more time consuming than in mysql.)
>
> In your first example, you were querying for individual rows, in the
> second you fetch them all at once. Of course it's quicker to run one
> query and return many rows than many queries each returning one row.
> Otherwise your RDBMS has something seriously wrong with it.
>
>> Further - I think - there is no real caching done in the PHP Library
>> of xampp 1.5.4, php_pgsql.dll ;(
>
> No (not that I'm sure what xampp is). Look into memcache/pg_memcache
> if you want caching - that does it properly.
Thanks.
(xampp is an environment to develop wep applications.)
>
>
>> It would be very helpful what causes the problem.
>
> You haven't actually described a problem afaict. Can you describe what
> it is you're trying to do? Do you want:
> 1. Details of a single column
> 2. To loop through all the columns doing something
> If you can say which of these you're trying to do, and what the
> "something" is, then I'm sure there's somebody here who can help.
I think it will be the best to provide example code.
>
>> Another question rises from this task:
>> ======================================
>>
>> Is there any possibility to use the shortcut \d as a query from PHP?
>> I used psql -E to display the query, but these queries rely on the
>> system catalogs, but the page:
>> http://www.postgresql.org/docs/8.2/interactive/information-schema.html
>> says they are not stable.
>> And - of course - it is much easier to type! ;)
>
> Nope - \d is psql only. That's because what it does changes with the
> system catalogues. If you want a stable representation, use the
> information_schema. If you want all the PostgreSQL-specific details
> you'll have to cope with changes to the system catalogues. However, I
> seem to remember some utility views that pgadmin uses that might be of
> use.
It would be nice to have the \d-query stored somewhere in the database.
>
> Oh - and you might find phppgadmin useful to look at.
>
Thanks.

Greetings,
Alexander Elgert

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Markus Schiltknecht 2007-03-07 11:14:04 Re: real multi-master replication?
Previous Message Martijn van Oosterhout 2007-03-07 10:13:35 Re: vacuum error