pgBadger 3 released : now with parallel parsing

From: damien clochard <damien(at)dalibo(dot)info>
To: PostgreSQL Announce <pgsql-announce(at)postgresql(dot)org>
Subject: pgBadger 3 released : now with parallel parsing
Date: 2013-02-26 14:33:56
Message-ID: 512CC7D4.4080002@dalibo.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-announce

Paris, France - February 25th, 2013

DALIBO is proud to announce the release of pgBadger v3, the new
PostgreSQL log analyzer. pgBadger is built for speed with fully detailed
reports from your PostgreSQL log file.

This new release brings significant improvements. All pgBadger users
should upgrade as soon as possible.

===== pgBadger 3 parallel log parsing =====

The first versions of pgBadger were bound to only one CPU. The
PostgreSQL log files were scanned sequentially. Analyzing very large log
files could take several hours.

This limitation is now removed. You can use as many CPU cores as you
want and scan your logs in parallel.

To enable parallel processing, you just have to use the -j N option, N
being the number of cores you want to use.

Please note that the parallel mode has a little drawback. With this
method, some queries may be truncated. If you enable N cores, then
result may differ in a maximum of N queries per log file.

However, this is a minor issue: parallel mode is interesting if you have
millions of queries to analyze. And if you have millions of queries in a
log file, you can afford to loose a few as it's quite unlikely that the
lost queries would have changed the overall results.

However, to avoid this problem, you can use the pgBadger "per-file
parallel mode" to analyze your logs but with lower performance than the
standard parallel mode. To enable this behaviour, you have to use the
"-J N" option instead of "-j N". In per-file mode, the performances
start being really interesting when there's hundreds of small log files
(e.g. 10MB rotation size limit) and with at least 8 cores.

===== How fast is pgBadger 3? =====

The goal was to allow pgBadger to use as many cores as specified to have
parallel log parsing. Here are some performance results using pgbadger
on five log files for a total of 9.5 GB:

* with 1 core => 1h 41m 18s
* with 2 cores => 50m 25s
* with 4 cores => 25m 39s
* with 8 cores => 15m 58s

We feel this performance gain is quite interesting :-)

===== New binary format =====

In addition to the classic HTML, TXT and Tsung output formats, pgBadger
3 is now able to generate a binary input/output format. This new format
is useful if you only want to store the log statistics and generate the
HTML report with graph later.

In a nutshell, the two main activities of pgBadger are parsing and
reporting. With this binary format, you can now split those activities
and run them at different times. For exemple, you can parse your log
once a day, and generate the HTML reports only when needed.

You can also combine several binary files to . For Instance, you may
create a binary report every week and aggregate the last 4 week reports
to build a monthly report in HTML.

This new binary format is also compatible with other tools such as
pgShark (https://github.com/dalibo/pgshark/)

===== More stats, more pie charts! =====

This major release also has additional features:

* New pie graph to show number of autovacuum per table and number of
tuples removed by autovacuum per table
* No more distinction between log_duration, log_statement and
log_min_duration_statement format
* New report of tuples/pages removed in report of VACUUM by table
* New VACUUM and ANALYZE hourly reports and graphs

... and many bugfixes .

For the complete list of changes, please checkout the release note on
https://github.com/dalibo/pgbadger/blob/master/ChangeLog

===== Deprecated options =====

**WARNING** : for the sake of simplicity, the
''--enable-log_min_duration'' and ''--enable-log_duration'' command line
options have been removed. pgbadger is now parsing any log_duration,
log_statement and log_min_duration_statement lines without distinction
and adapt the reports following those lines.

If you are running pgBadger using cron, please take care: if one of
theses options appears in the command line, pgbadger will refuse to start.

===== Links & Credits =====

DALIBO would like to thank the developers who submitted patches and the
users who reported bugs and feature requests, especially Matt Romaine,
Luke Cyca, Kevin Brannen, Adam Schroder, pilat66, Euler Taveira de
Oliveira, stuntmunkee, pierrestroh, Vipul, Dirk-Jan Bulsink and Vincent
Laborie.

pgBadger is an open project. Any contribution to build a better tool is
welcome. You just have to send your ideas, features requests or patches
using the GitHub tools or directly on our mailing list.

Links :

* Download : https://sourceforge.net/projects/pgbadger/
* Mailing List :
https://listes.dalibo.com/cgi-bin/mailman/listinfo/pgbagder

--------------

**About pgBadger** :

pgBagder is a new generation log analyzer for PostgreSQL, created by
Gilles Darold, also author of ora2pg migration tool. pgBadger is a fast
and easy tool to analyze your SQL traffic and create HTML5 reports with
dynamics graphs. pgBadger is the perfect tool to understand the behavior
of your PostgreSQL server and identify which SQL queries need to be
optimized.

Docs, Download & Demo at http://dalibo.github.com/pgbadger/

--------------

**About DALIBO** :

DALIBO is the leading PostgreSQL company in France, providing support,
trainings and consulting to its customers since 2005. The company
contributes to the PostgreSQL community in various ways, including :
code, articles, translations, free conferences and workshops

Check out DALIBO's open source projects at http://dalibo.github.com

Browse pgsql-announce by date

  From Date Subject
Next Message Heiko W. Rupp 2013-02-27 10:22:48 RHQ 4.6 released
Previous Message David Fetter 2013-02-25 05:00:12 == PostgreSQL Weekly News - February 24 2013 ==