Re: Sample data generator for performance testing

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: arun chirappurath <arunsnmimt(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Sample data generator for performance testing
Date: 2024-01-03 18:31:20
Message-ID: 49380122-9643-43f0-bbef-bd34ab6d434d@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 1/3/24 9:50 AM, arun chirappurath wrote:
>
>
> On Wed, 3 Jan, 2024, 23:03 Adrian Klaver, <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
>
> On 1/3/24 09:24, arun chirappurath wrote:
> > Hi Adrian,
> >
> > Thanks for your mail.
> >
> > Is this for all tables in the database or a subset? Yes
>
> Yes all tables or yes just some tables?
> All tables.except some which has user details.
>
>
> >
> > Does it need to deal with foreign key relationships? No
> >
> > What are the sizes of the existing data and what size sample
> data do you
> > want to produce?1Gb and 1Gb test data.
>
> If the source data is 1GB and the test data is 1GB then there is no
> sampling, you are using the data population in its entirety.
>
> Yes.would like to double the load and test.
>

Does that mean you want to take the 1GB of your existing data and double
it to 2GB while maintaining

the data distribution from the original data?

>
> Also do we have any standard methods for sampling and generating test data

Something like?:

https://www.postgresql.org/docs/current/sql-select.html

"|TABLESAMPLE /|sampling_method|/ ( /|argument|/ [, ...] ) [ REPEATABLE
( /|seed|/ ) ]|

A |TABLESAMPLE| clause after a /|table_name|/ indicates that the
specified /|sampling_method|/ should be used to retrieve a subset of
the rows in that table. This sampling precedes the application of
any other filters such as |WHERE| clauses. The standard PostgreSQL
distribution includes two sampling methods, |BERNOULLI| and
|SYSTEM|, and other sampling methods can be installed in the
database via extensions

...
"

Read the rest of the documentation for TABLESAMPLE to get the details.

>
>
> >
> > On Wed, 3 Jan, 2024, 22:40 Adrian Klaver,
> <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> >     On 1/2/24 23:23, arun chirappurath wrote:
> >      > Hi All,
> >      >
> >      > Do we have any open source tools which can be used to create
> >     sample data
> >      > at scale from our postgres databases?
> >      > Which considers data distribution and randomness
> >
> >
> >
> >      >
> >      > Regards,
> >      > Arun
> >
> >     --
> >     Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message PGUser2020 2024-01-04 08:42:20 PostgreSQL 11 packages gone from reporpms?
Previous Message arun chirappurath 2024-01-03 18:02:03 Re: Sample data generator for performance testing