From: | arun chirappurath <arunsnmimt(at)gmail(dot)com> |
---|---|
To: | Jeremy Schneider <schneider(at)ardentperf(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Sample data generator for performance testing |
Date: | 2024-01-03 18:02:03 |
Message-ID: | CAA23Sds9zP5ib0AQwT_S7qShBbDp1NhmxOXbrC+fJe9v_rA7pg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks for the insights..
Thanks,
Arun
On Wed, 3 Jan, 2024, 23:26 Jeremy Schneider, <schneider(at)ardentperf(dot)com>
wrote:
> On 1/2/24 11:23 PM, arun chirappurath wrote:
> > Do we have any open source tools which can be used to create sample data
> > at scale from our postgres databases?
> > Which considers data distribution and randomness
>
> I would suggest to use the most common tools whenever possible, because
> then if you want to discuss results with other people (for example on
> these mailing lists) then you're working with data sets that are widely
> and well understood.
>
> The most common tool for PostgreSQL is pgbench, which does a TPCB-like
> schema that you can scale to any size, always the same [small] number of
> tables/columns and same uniform data distribution, and there are
> relationships between tables so you can create FKs if needed.
>
> My second favorite tool is sysbench. Any number of tables, easily scale
> to any size, standardized schema with small number of colums and no
> relationships/FKs. Data distribution is uniformly random however on the
> query side it supports a bunch of different distribution models, not
> just uniform random, as well as queries processing ranges of rows.
>
> The other tool that I'm intrigued by these days is benchbase from CMU.
> It can do TPCC and a bunch of other schemas/workloads, you can scale the
> data sizes. If you're just looking at data generation and you're going
> to make your own workloads, well benchbase has a lot of different
> schemas available out of the box.
>
> You can always hand-roll your schema and data with scripts & SQL, but
> the more complex and bespoke your performance test schema is, the more
> work & explaining it takes to get lots of people to engage in a
> discussion since they need to take time to understand how the test is
> engineered. For very narrowly targeted reproductions this is usually the
> right approach with a very simple schema and workload, but not commonly
> for general performance testing.
>
> -Jeremy
>
>
> --
> http://about.me/jeremy_schneider
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2024-01-03 18:31:20 | Re: Sample data generator for performance testing |
Previous Message | Jeremy Schneider | 2024-01-03 17:56:53 | Re: Sample data generator for performance testing |