From: | Nicolas Paris <niparisco(at)gmail(dot)com> |
---|---|
To: | |
Cc: | pgsql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: PG 9.3 materialized view VS Views, indexes, shared memory |
Date: | 2015-03-10 08:53:20 |
Message-ID: | CA+ssMOQL1bEzG+uJWQVgbtOUm9d2OVk_x5TpMHf9xEKj1Zj5DA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
>
>
> Why are you dropping and re-loading? You mentioned it before and it
> sounded like it had something to do with adding columns, but you
don't have to drop and reload to add a column.
Adding a NULL column is fast. Dropping one too. I need to set some row as
TRUE. I can do it with an update, but in postgresql update is done by
delete then insert with copy of the row. This is really slow. A drop
cascade, then bulk load is better.
This is not the only reason. Drop & load simplify all the ETL process. No
question of delta changes and no "fuck brain" when a problem occurs or a
modification of the table. I've tested, it loads 20milion rows in 5 min
(without time for reindexing and time to retrieve datas)
2015-03-10 9:31 GMT+01:00 Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>:
> On 3/9/15 8:17 AM, Nicolas Paris wrote:
>
>> (sorry for top-posting, gmail does not help.)
>>
>
> *shakes fist at gmail*
>
> Thanks to your advice Jim, I have done an other test :
>> No partial indexes, just a partial index on boolean columns does the
>> job. (I get same perfs as MV)
>> CREATE INDEX ..ON (BoolColumnX) WHERE BoolColumnX IS TRUE
>>
>> Then VIEW =
>> SELECT colA....colZ
>> FROM huge_table
>> WHERE BoolColumnX IS TRUE
>>
>> Then this only index is used 800times (for each bool col) and saves
>> place as it does'nt indexes NULL values, and does no replicate. subsets.
>> Moreover the huge indexes are allways loaded in cache memory.
>>
>
> Cool. :)
>
> According to this link
>> http://postgresql.nabble.com/NULL-saves-disk-space-td4344106.html
>> NULL values do not take place if only one other column are null for
>> that row.
>> Boolean takes 1 byte wheras smallint 2bytes.
>> Then the space problem is not anymore a problem with boolean columns
>> 95% empty
>>
>> One thing that is really great with postgresql is transaction for
>> drop table cascade, that allow te restore all stuf index, views on a
>> rollback if problem in loading appears.
>> I hope using one transaction to drop/load many table is not a
>> performance issue ?
>>
>
> Why are you dropping and re-loading? You mentioned it before and it
> sounded like it had something to do with adding columns, but you don't have
> to drop and reload to add a column.
>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Gunnlaugur Thor Briem | 2015-03-11 15:54:40 | Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT |
Previous Message | Jim Nasby | 2015-03-10 08:31:29 | Re: PG 9.3 materialized view VS Views, indexes, shared memory |