Re: Processing very large TEXT columns (300MB+) using C/libpq

From: Geoff Winkless <pgsqladmin(at)geoff(dot)dj>
To: Bear Giles <bgiles(at)coyotesong(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Cory Nemelka <cnemelka(at)gmail(dot)com>, Aldo Sarmiento <aldo(at)bigpurpledot(dot)com>, pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Processing very large TEXT columns (300MB+) using C/libpq
Date: 2017-10-23 09:36:00
Message-ID: CAEzk6ffoE052b_PMYnf_y+u=2ckn5WTYisxmPuzHj5SWmi8r_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 21 Oct 2017 12:32, "Bear Giles" <bgiles(at)coyotesong(dot)com> wrote:

> In that case you must put a read lock on the string that covers the loop. If you're in
> a multi-threaded environment and not using locks when appropriate then all bets are off.

You reckon a compiler can decide to blow up your code by making
assumptions like that?

Your loop could set a var for a state machine in a processing thread
to modify the string. That doesn't preclude correct locking behaviour.

If you think that's too contrived then forget threads, you could make
a shared library call that the compiler can't assess at compile-time
that could change the string.

Yes, in either case, using strlen to check for that is poor code, but
the compiler can't assume you're not using poor code.

This argument is pretty pointless. The only way to be sure to avoid
the problem is to assume that the compiler won't optimize bad code.

FWIW gcc 4.8.5 with -O3 doesn't optimize away strlen even in code this simple:

#include <stdio.h>
#include <string.h>

int main (int argc, char **argv) {
int i;
char *buff;
buff=malloc(strlen(argv[1]));
for (i=0; i < strlen(argv[1]); i++) {
buff[i]=argv[1][i];
}
printf("%s", buff);
}

.L3:
movzbl 0(%rbp,%rbx), %edx
movb %dl, (%r12,%rbx)
movq 8(%r13), %rbp
addq $1, %rbx
.L2:
movq %rbp, %rdi
call strlen
cmpq %rax, %rbx
jb .L3

However, it _does_ optimize this code:

int main (int argc, char **argv) {
int i;
char *buff;
char *buff2;
buff2=strdup(argv[1]);
buff=malloc(strlen(buff2));
for (i=0; i < strlen(buff2); i++) {
buff[i]=buff2[i];
}
printf("%s", buff);
}

I assume that's because it can be certain at compile time that, since
both buff and buff2 are local, nothing else is going to modify the
source string (without some stack smashing, anyway).

Geoff

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Geoff Winkless 2017-10-23 10:17:42 Re: Processing very large TEXT columns (300MB+) using C/libpq
Previous Message Ervin Weber 2017-10-22 14:14:25 confusing .pgpass behaviour for undocumented replication=true connection parameter