From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Herouth Maoz <herouth(at)unicell(dot)co(dot)il> |
Cc: | Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, pgsql-general(at)postgresql(dot)org, tv(at)fuzzy(dot)cz |
Subject: | Re: Index creation takes more time? |
Date: | 2012-09-18 17:19:27 |
Message-ID: | CAMkU=1yNA3zp1FcFdXDeXd4LN1Lfpr-R1B2h8pyAAo1bGYw2Lw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Sep 18, 2012 at 1:13 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il> wrote:
> I think you hit the nail right on the head when you asked:
>
>
>
>> I wonder if they have different encoding/collations.
>
> [headdesk]Of course. One of the requirements of the upgrade was to change
> the database encoding to unicode, because previously it was in an 8-bit
> encoding and we couldn't handle international text, which has become an
> absolute necessity. So when I restored the database, I took care to create
> it in unicode first:
> So, I must, at this point, draw the conclusion that string comparison is a
> much, much heavier task in utf-8 than it is in an 8-bit encoding - or that
> the collation is the problem.
>
> Running a different test, which involves string comparison, shows a bigger
> discrepancy:
>
> select count( foo ), foo from ( select cast(random() as varchar(14)) >
> cast(random() as varchar(14)) as foo
> from generate_series (1,100000000)) asdf
> group by foo;
>
> PC: Time: 308152.090 ms
> Server: Time: 499631.553 ms
I think the one below will show an even larger discrepancy. You are
doing 2 casts for each comparison,
so I think the casts overhead will dilute out the comparison.
select count(distinct foo) from ( select cast(random() as varchar(14)) as foo
from generate_series (1,100000000)) asdf;
> Finally, I created a test table, as you asked:
>
>
>> create table foo as select msisdn,sme_reference from
>> sms.billing__archive limit 1000000;
>
> Then I created an index on the msisdn and sme_reference columns together.
> 99% of the data in the msisdn field consist of 11-digit phone numbers.
> Result:
>
> PC: 5792.641 ms
> Server: 23740.470 ms
>
> Huge discrepancy there.
try:
create index ON foo (msisdn COLLATE "C", sme_reference) ;
This can only be done on 9.1 server, as that feature is new to that
release. It should be much faster to create than the index with
default collation.
(or change the collation of msisdn column definition, rather than just
in the index).
This assumes you just need the index for equality, not for some
precise locale-specific ordering (which for phone numbers seems like a
safe bet).
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Eden Cardim | 2012-09-18 18:20:31 | Re: Column aliases in WHERE clauses |
Previous Message | Rob Sargent | 2012-09-18 16:11:26 | Re: Time-based trigger |