From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Marc Mamin <M(dot)Mamin(at)intershop(dot)de> |
Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Counting the occurences of a substring within a very large text |
Date: | 2015-06-24 16:21:43 |
Message-ID: | 2831.1435162903@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Marc Mamin <M(dot)Mamin(at)intershop(dot)de> writes:
> I'd like to count the number linebreaks within a string,
> but I get a memory allocation error when using regexp_matches or regexp_split_to_table.
Anything involving a regexp is going to have a maximum input string length
of about 256MB, as a result of conversion to pg_wchar format.
regexp_split_to_table(), for instance, does this:
/* convert string to pg_wchar form for matching */
orig_len = VARSIZE_ANY_EXHDR(orig_str);
wide_str = (pg_wchar *) palloc(sizeof(pg_wchar) * (orig_len + 1));
wide_len = pg_mb2wchar_with_len(VARDATA_ANY(orig_str), wide_str, orig_len);
palloc() would complain for requests beyond 1GB, and sizeof(pg_wchar) is
4, so 256MB is the longest orig_str this can handle. (This coding is on
the hairy edge of being a security bug on 32-bit machines, too, but AFAICS
it is okay because there's a factor of 4 daylight between the max possible
input length of 1GB and overflowing uint32.)
In newer branches we could consider using MemoryContextAllocHuge to escape
the 1GB limit on wide_str, but it would take some research to be sure that
everything involved is using suitable datatypes for string indices and so
forth. I'd be a bit worried about the performance implications of
throwing such large strings around, anyway.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Colson | 2015-06-24 16:32:14 | Re: DB access speeds, App(linux)<->PG(linux) vs App(linux) <->MSSql(Windows) |
Previous Message | Gauthier, Dave | 2015-06-24 15:13:49 | Re: DB access speeds, App(linux)<->PG(linux) vs App(linux) <->MSSql(Windows) |