Re: [HACKERS] Index creation takes for ever

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, ohp(at)pyrenet(dot)fr, pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Index creation takes for ever
Date: 2003-12-01 05:03:38
Message-ID: 200312010503.hB153cf18965@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


Patch removed from queue.

---------------------------------------------------------------------------

Manfred Koizar wrote:
> On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> wrote:
> >ohp(at)pyrenet(dot)fr writes:
> >> it took 69 minutes to finish, 75% of this time was devoted to create 2
> >> indexes on varchar(2) with value being 'O', 'N' or null;
> >
> >I still say it's either strcoll or qsort's fault.
>
> If qsort is to blame, then maybe this patch could help. It sorts
> equal key values on item pointer. And if it doesn't help index
> creation speed, at least the resulting index has better correlation.
>
> Test script:
> CREATE TABLE t (i int NOT NULL, t text NOT NULL);
> INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf');
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf');
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> ANALYZE t;
> CREATE INDEX t_i ON t(i);
> SET enable_seqscan = 0;
> SELECT ctid FROM t WHERE i=100 LIMIT 10;
>
> Result without patch:
> ctid
> ----------
> (153,14)
> (306,23)
> (305,80)
> (152,91)
> (76,68)
> (38,34)
> (153,34)
> (305,50)
> (9,62)
> (305,40)
> (10 rows)
>
> Result with patch:
> ctid
> --------
> (0,5)
> (0,10)
> (0,15)
> (0,20)
> (0,25)
> (0,30)
> (0,35)
> (0,40)
> (0,45)
> (0,50)
> (10 rows)
>
> For testing purposes I have made a second patch that provides a
> boolean GUC variable sort_index. It is available here:
> http://www.pivot.at/pg/23.test-IdxTupleSort.diff
>
> Servus
> Manfred

> diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c
> --- ../base/src/backend/utils/sort/tuplesort.c 2003-08-17 21:58:06.000000000 +0200
> +++ src/backend/utils/sort/tuplesort.c 2003-09-05 10:04:22.000000000 +0200
> @@ -2071,6 +2071,33 @@
> (errcode(ERRCODE_UNIQUE_VIOLATION),
> errmsg("could not create unique index"),
> errdetail("Table contains duplicated values.")));
> + else
> + {
> + /*
> + * If key values are equal, we sort on ItemPointer. This might help
> + * for some bad qsort implementation having performance problems
> + * with many equal items. OTOH I wouldn't trust such a weak qsort
> + * to handle pre-sorted sequences very well ...
> + *
> + * Anyway, this code doesn't hurt much, and it helps produce indices
> + * with better index correlation which is a good thing per se.
> + */
> + ItemPointer tid1 = &tuple1->t_tid;
> + ItemPointer tid2 = &tuple2->t_tid;
> + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1);
> + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2);
> +
> + if (blk1 != blk2)
> + return (blk1 < blk2) ? -1 : 1;
> + else
> + {
> + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1);
> + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2);
> +
> + if (pos1 != pos2)
> + return (pos1 < pos2) ? -1 : 1;
> + }
> + }
>
> return 0;
> }

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry G. Mastrukov 2003-12-01 05:12:54 default operator class: btree or hash
Previous Message Bruce Momjian 2003-12-01 05:02:54 Re: [HACKERS] Index creation takes for ever

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2003-12-01 05:04:10 Re: WIP: unique hash indexes
Previous Message Bruce Momjian 2003-12-01 05:03:28 Re: WIP: unique hash indexes