From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, "Jason O'Donnell" <odonnelljp01(at)gmail(dot)com> |
Subject: | export import bytea from psql |
Date: | 2017-05-09 15:40:41 |
Message-ID: | CAFj8pRB3n1EAWAHzsT7S063deZGrrbZcvps9QB0jtbvyrD0Odg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
This is topic, that I try to solve some longer time. I would to open
discussion before summer commitfest.
The work with binary data or long text fields (multilines) or long not only
utf8 encoded XML is not well supported by psql console. Is not hard to use
binary protocol and write custom application, but it needs some time and
some experience. Buildin support in psql can helps lot of users.
Export
=====
Last time the most conflict point is mapping bytea field to file. Has sense
or has not sense to push more bytea fields to one file? It is
implementation detail - just raise or not raise a exception when count > 1.
Mapping 1:1 is little bit cleaner, simpler from user perspective - but some
use cases are not possible and files should be joined on operation system
level. Can we find a agreement?
Design
----------
SELECT xxx
\gstore file -- use text protocol
SELECT xxx
\gbstore file -- use binary protocol
Theoretically we can support more target files in one command. But I am
thinking it is over engineering. If somebody need it, then he can write
small app in Perl, Python ...
Import
=====
There are more possible ways
1. using psql variables - we just need to write commands \setfrom
\setfrombf - this should be very simple implementation. The value can be
used more times. On second hand - the loaded variable can hold lot of
memory (and it can be invisible for user). Next disadvantage - when SQL
commands with this value fails, then the full SQL command can be written to
log (there is high risk of log bloating).
2. using psql variables with query variables - first part is same like @1.
This needs a parametric queries support. This is partially optimised first
variant - the risk of log bloating is reduced.
3. using special gexec command where query parameters can be replaced by
specified content. Some like
insert into foo values($1, $2)
\gusefiles xml:~/xx.xml bytea:~/avatar.jpg
This is one step command - so work can be faster. There is zero risk of
forgotten content in variable. On second hand the command is more complex.
The binary import with binary protocol is risky if content doesn't respect
a protocol. I don't think we need to support unknown (any) formats and
custom data types. Only known formats can be supported text, json, xml
(text or binary), bytea (text or binary). Using binary protocol for XML
format enforces automatic conversion.
Opinions? Notes?
Regards
Pavel
From | Date | Subject | |
---|---|---|---|
Next Message | Petr Jelinek | 2017-05-09 15:43:21 | Re: logical replication syntax (was DROP SUBSCRIPTION, query cancellations and slot handling) |
Previous Message | Fabien COELHO | 2017-05-09 15:12:04 | Re: pgbench tap tests & minor fixes |