Select random lines of a table using a probability distribution

From: "Jira, Marcel" <Marcel(dot)Jira(at)wu(dot)ac(dot)at>
To: "'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org>
Subject: Select random lines of a table using a probability distribution
Date: 2011-07-13 13:27:10
Message-ID: D793F5C522F1DD40BB9DC43586C57637E098CC87C6@MBX-B.ad.wu-wien.ac.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi!

Let's consider I have a table like this

id qualification gender age income

I'd like to select (for example 100) lines of this table by random, but the random mechanism has to follow a certain probability distribution.

I want to use this procedure to construct a test group for another selection.

Example:

I filter all lines having the qualification "plumber".
I get 50 different ids consisting of 40 males, 10 females and a certain age distribution.

I also get some information concerning the income of the plumbers.

Now I want to know if the income is more influenced by the gender and age distribution or by the qualification "plumber".

Therefore I would like to select a test group (of 50 or more) without any plumbers. This test group has to follow the same age and gender distribution.

Then I would be able to compare this groups income statistics with the plumbers income statistics.

Is this possible (and doable with reasonable effort) in PostgreSQL?

Thank you in advance.

Best regards,

Marcel Jira

????? ~~~ * ~~~
? Mag. Marcel Jira
? Institut für Sozialpolitik, Wirtschaftsuniversität Wien
? +43 1 313 36-5890
? UZA IV, D 317
? http://www.wu.ac.at/sozialpolitik/team/wimi/jira
????? ~~~ * ~~~

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message ktm@rice.edu 2011-07-13 13:58:10 Re: Select random lines of a table using a probability distribution
Previous Message Jose Ig Mendez 2011-07-13 12:33:03 Re: newbie question * compare integer in a "where IN" statement