Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.

From: Daniel Shane <shaned(at)LEXUM(dot)UMontreal(dot)CA>
To: pgsql-general(at)postgresql(dot)org
Subject: Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.
Date: 2009-10-08 15:24:41
Message-ID: 1658173889.1661255015481545.JavaMail.root@vicenza.dmz.lexum.pri
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all!

I have an interesting problem here that I think could be of interest to everyone. I in the process of writing test cases for our applications and there is one problem I am facing. To be able to test correctly, I need to create a small database (a sample if you want) from a very large one so that I can run some tests on a subset of the data.

Sometimes you are asked to do this but know nothing about the database in advance (ugh!).

I could create several queries and build it myself by trial and error, but I was wondering if a more general approach could be elaborated.

For my case, the testing does not write in the database and queries are simple in nature (they do not use count() or anything that needs any whole table to work).

Here are some solutions that I have come up with :

a) Run the main program A on the large database, however, I will restrict its operation to only a subset of the data.

If I create a MockConnection, I could save all the queries as text strings and serialize the result set that postgres returned and use this logging to re-run the program without connection to the real database. I would simply return the serialized result set if I find a match for the query.

b) Take the source code of Postgres and add some tweaks in there so that it logs every table/row that was needed in the output resultSet and build a seperate minimal DB from that.

With this options, database B (the minimalist one) would not have any constraints. On the other hand, it may be very difficult for me to add this logging in the source code.

Of course, option B seems much more interesting, since a small optimization in the programs queries would probably still work while in option a) it would fail immediately.

I was wondering if maybe there was something else I could do to solve this problem in a general way?

Daniel Shane

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2009-10-08 15:28:17 Re: Query inside RTF
Previous Message Massa, Harald Armin 2009-10-08 14:56:43 cost of query done - recorded anywhere?