From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | arun chirappurath <arunsnmimt(at)gmail(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Sample data generator for performance testing |
Date: | 2024-01-03 18:31:20 |
Message-ID: | 49380122-9643-43f0-bbef-bd34ab6d434d@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 1/3/24 9:50 AM, arun chirappurath wrote:
>
>
> On Wed, 3 Jan, 2024, 23:03 Adrian Klaver, <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
>
> On 1/3/24 09:24, arun chirappurath wrote:
> > Hi Adrian,
> >
> > Thanks for your mail.
> >
> > Is this for all tables in the database or a subset? Yes
>
> Yes all tables or yes just some tables?
> All tables.except some which has user details.
>
>
> >
> > Does it need to deal with foreign key relationships? No
> >
> > What are the sizes of the existing data and what size sample
> data do you
> > want to produce?1Gb and 1Gb test data.
>
> If the source data is 1GB and the test data is 1GB then there is no
> sampling, you are using the data population in its entirety.
>
> Yes.would like to double the load and test.
>
Does that mean you want to take the 1GB of your existing data and double
it to 2GB while maintaining
the data distribution from the original data?
>
> Also do we have any standard methods for sampling and generating test data
Something like?:
https://www.postgresql.org/docs/current/sql-select.html
"|TABLESAMPLE /|sampling_method|/ ( /|argument|/ [, ...] ) [ REPEATABLE
( /|seed|/ ) ]|
A |TABLESAMPLE| clause after a /|table_name|/ indicates that the
specified /|sampling_method|/ should be used to retrieve a subset of
the rows in that table. This sampling precedes the application of
any other filters such as |WHERE| clauses. The standard PostgreSQL
distribution includes two sampling methods, |BERNOULLI| and
|SYSTEM|, and other sampling methods can be installed in the
database via extensions
...
"
Read the rest of the documentation for TABLESAMPLE to get the details.
>
>
> >
> > On Wed, 3 Jan, 2024, 22:40 Adrian Klaver,
> <adrian(dot)klaver(at)aklaver(dot)com
> > <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> > On 1/2/24 23:23, arun chirappurath wrote:
> > > Hi All,
> > >
> > > Do we have any open source tools which can be used to create
> > sample data
> > > at scale from our postgres databases?
> > > Which considers data distribution and randomness
> >
> >
> >
> > >
> > > Regards,
> > > Arun
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
From | Date | Subject | |
---|---|---|---|
Next Message | PGUser2020 | 2024-01-04 08:42:20 | PostgreSQL 11 packages gone from reporpms? |
Previous Message | arun chirappurath | 2024-01-03 18:02:03 | Re: Sample data generator for performance testing |