Re: Sample data generator for performance testing

From: arun chirappurath <arunsnmimt(at)gmail(dot)com>
To: Jeremy Schneider <schneider(at)ardentperf(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Sample data generator for performance testing
Date: 2024-01-03 18:02:03
Message-ID: CAA23Sds9zP5ib0AQwT_S7qShBbDp1NhmxOXbrC+fJe9v_rA7pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks for the insights..

Thanks,
Arun

On Wed, 3 Jan, 2024, 23:26 Jeremy Schneider, <schneider(at)ardentperf(dot)com>
wrote:

> On 1/2/24 11:23 PM, arun chirappurath wrote:
> > Do we have any open source tools which can be used to create sample data
> > at scale from our postgres databases?
> > Which considers data distribution and randomness
>
> I would suggest to use the most common tools whenever possible, because
> then if you want to discuss results with other people (for example on
> these mailing lists) then you're working with data sets that are widely
> and well understood.
>
> The most common tool for PostgreSQL is pgbench, which does a TPCB-like
> schema that you can scale to any size, always the same [small] number of
> tables/columns and same uniform data distribution, and there are
> relationships between tables so you can create FKs if needed.
>
> My second favorite tool is sysbench. Any number of tables, easily scale
> to any size, standardized schema with small number of colums and no
> relationships/FKs. Data distribution is uniformly random however on the
> query side it supports a bunch of different distribution models, not
> just uniform random, as well as queries processing ranges of rows.
>
> The other tool that I'm intrigued by these days is benchbase from CMU.
> It can do TPCC and a bunch of other schemas/workloads, you can scale the
> data sizes. If you're just looking at data generation and you're going
> to make your own workloads, well benchbase has a lot of different
> schemas available out of the box.
>
> You can always hand-roll your schema and data with scripts & SQL, but
> the more complex and bespoke your performance test schema is, the more
> work & explaining it takes to get lots of people to engage in a
> discussion since they need to take time to understand how the test is
> engineered. For very narrowly targeted reproductions this is usually the
> right approach with a very simple schema and workload, but not commonly
> for general performance testing.
>
> -Jeremy
>
>
> --
> http://about.me/jeremy_schneider
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-01-03 18:31:20 Re: Sample data generator for performance testing
Previous Message Jeremy Schneider 2024-01-03 17:56:53 Re: Sample data generator for performance testing