From: | "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | "Kynn Jones" <kynnjo(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: How to "unique-ify" HUGE table? |
Date: | 2008-12-23 17:34:28 |
Message-ID: | dcc563d10812230934y613ec899sffe5483e87c93cfd@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Dec 23, 2008 at 10:25 AM, Kynn Jones <kynnjo(at)gmail(dot)com> wrote:
> Hi everyone!
> I have a very large 2-column table (about 500M records) from which I want to
> remove duplicate records.
> I have tried many approaches, but they all take forever.
> The table's definition consists of two short TEXT columns. It is a
> temporary table generated from a query:
>
> CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
> Initially I tried
> CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
> but after waiting for nearly an hour I aborted the query, and repeated it
> after getting rid of the DISTINCT clause.
> Everything takes forever with this monster! It's uncanny. Even printing it
> out to a file takes forever, let alone creating an index for it.
> Any words of wisdom on how to speed this up would be appreciated.
Did you try cranking up work_mem to something that's a large
percentage (25 to 50%) of total memory?
From | Date | Subject | |
---|---|---|---|
Next Message | D'Arcy J.M. Cain | 2008-12-23 17:39:17 | Re: How to "unique-ify" HUGE table? |
Previous Message | Kynn Jones | 2008-12-23 17:25:48 | How to "unique-ify" HUGE table? |