Re: tuple data size and compression

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Tom DalPozzo <t(dot)dalpozzo(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: tuple data size and compression
Date: 2016-12-16 15:10:24
Message-ID: 38569cb0-7de0-4015-feac-755950955ec6@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12/16/2016 12:23 AM, Tom DalPozzo wrote:
> I see. But in my case rows don't reach that thresold (I didn't
> check if
> 2K but I didn't change anything). So I'm wondering if there is
> any other
> chance except the TOAST to get the rows compressed or not.
>
>
> Are you really sure you want that? For small files the overhead of
> compression tends to out weigh the benefits. A contrived example
> biased to making my point:
>
> aklaver(at)killi:~> dd if=/dev/urandom of=file.txt bs=10 count=10
> 10+0 records in
> 10+0 records out
> 100 bytes (100 B) copied, 0.253617 s, 0.4 kB/s
> aklaver(at)killi:~> l -h file.txt
> -rw-r--r-- 1 aklaver users 100 Dec 15 13:07 file.txt
> aklaver(at)killi:~> gzip file.txt
> aklaver(at)killi:~> l -h file.txt.gz
> -rw-r--r-- 1 aklaver users 132 Dec 15 13:07 file.txt.gz
>
>
> I noticed that, when I use constant data, the total IO writes (by
> iostat) are more or less 1/2 of the the total IO writes when using
> random or other data hard to compress.
>
>
> Define constant data?
>
> I thought the data you are inputting is below the compression threshold?
>
> Is I/O causing a problem or to put it another way, what is the
> problem you are trying to solve?
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>
> Hi Adrian,
> I haven't got a problem. I'm just trying to estimate a worst case of
> total IO writes of my DB on the field, over the years.

Well worst case is some program goes haywire and saturates you I0 with
writes. I would think a more reasonable estimate would be the average
write load with a +/- standard deviation. Though that would entail
guessing more about the future then I would be willing to commit to.

> If tuples get compressed, then all depends on the data compressibility.
> So, either 1) I understand if/when they get compressed or 2) I must
> simulate with different kinds of data (compressible and not).
> Furthermore, in general I like, if possible, to be aware of what is
> going behind the scenes.
> As for 2) I made 3 tests, each inserting 1million rows of 100 bytea on
> my 'dati' field. First I tried 100 bytes=constant data=\\x7b7b7b7b....
> then I tried 100 bytes=random and finally inserting 100 bytes taken
> from random offsets of a compressed file.
> It seems that IO writes strongly depend on the data, even if dati field
> length is just 100.

That goes without saying.

Two questions:

1) Do you know what your data in the future is going to be?

2) Is a 100 byte bytea a realistic approximation of that data?

> wal_compression=off.
> Regards
> Pupillo
>
>
>
>
>
>
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vladimir Rusinov 2016-12-16 15:59:29 Re: pgbench initialize
Previous Message Thomas.Deboben.ext 2016-12-16 15:01:25 Re: Windows installation - could not connect to server: Connection refused (0x0000274D/10061)