Re: citext performance

From: Deepak Somaiya <deepsom(at)yahoo(dot)com>
To: Nandakumar M <m(dot)nanda92(at)gmail(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, "david(at)kineticode(dot)com" <david(at)kineticode(dot)com>
Subject: Re: citext performance
Date: 2018-04-08 20:14:29
Message-ID: 1522749620.304966.1523218469730@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

It is using index here , it is just that performance i.e query that use functional index (one with lower) is performing better then index created on citext column.
Deepak
On Sunday, April 8, 2018, 3:13:26 AM PDT, Nandakumar M <m(dot)nanda92(at)gmail(dot)com> wrote:

Hi,

I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).

Regards,
Nanda

On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom(at)yahoo(dot)com> wrote:
>
> Folks,
>  I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
>    Node Type: "Seq Scan"
>    Parallel Aware: false
>    Relation Name: "test"
>    Alias: "test"
>    Startup Cost: 0.00
>    Total Cost: 23334.00
>    Plan Rows: 5000
>    Plan Width: 37
>    Actual Startup Time: 0.016
>    Actual Total Time: 680.199
>    Actual Rows: 1
>    Actual Loops: 1
>    Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>    Rows Removed by Filter: 999999
>  Planning Time: 0.045
>  Triggers:
>  Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
>    Node Type: "Seq Scan"
>    Parallel Aware: false
>    Relation Name: "testci"
>    Alias: "testci"
>    Startup Cost: 0.00
>    Total Cost: 20834.00
>    Plan Rows: 1
>    Plan Width: 37
>    Actual Startup Time: 0.017
>    Actual Total Time: 1184.485
>    Actual Rows: 1
>    Actual Loops: 1
>    Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>    Rows Removed by Filter: 999999
>  Planning Time: 0.029
>  Triggers:
>  Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>    Node Type: "Bitmap Heap Scan"
>    Parallel Aware: false
>    Relation Name: "test"
>    Alias: "test"
>    Startup Cost: 187.18
>    Total Cost: 7809.06
>    Plan Rows: 5000
>    Plan Width: 37
>    Actual Startup Time: 0.020
>    Actual Total Time: 0.020
>    Actual Rows: 1
>    Actual Loops: 1
>    Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>    Rows Removed by Index Recheck: 0
>    Exact Heap Blocks: 1
>    Lossy Heap Blocks: 0
>    Plans:
>      - Node Type: "Bitmap Index Scan"
>        Parent Relationship: "Outer"
>        Parallel Aware: false
>        Index Name: "textlowerindex"
>        Startup Cost: 0.00
>        Total Cost: 185.93
>        Plan Rows: 5000
>        Plan Width: 0
>        Actual Startup Time: 0.016
>        Actual Total Time: 0.016
>        Actual Rows: 1
>        Actual Loops: 1
>        Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>  Planning Time: 0.051
>  Triggers:
>  Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>    Node Type: "Index Scan"
>    Parallel Aware: false
>    Scan Direction: "Forward"
>    Index Name: "citextindex"
>    Relation Name: "testci"
>    Alias: "testci"
>    Startup Cost: 0.42
>    Total Cost: 8.44
>    Plan Rows: 1
>    Plan Width: 37
>    Actual Startup Time: 0.049
>    Actual Total Time: 0.050
>    Actual Rows: 1
>    Actual Loops: 1
>    Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>    Rows Removed by Index Recheck: 0
>  Planning Time: 0.051
>  Triggers:
>  Execution Time: 0.064
>
>
> Deepak

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2018-04-10 02:36:27 Latest advice on SSD?
Previous Message Nandakumar M 2018-04-08 10:12:46 Re: citext performance