Professional web site development, maintenance and support services.
PHP-PostgreSQL Tips
Welcome 38.103.63.18
9 May 2008 17:01:30 MDT |
ubi dubium ibi libertas
Potts Software - DatabasesPostgreSQL Tips
PHP PGSQL Tips:
Dynamically assign variable name-value pairs from a select query in PostgreSQL
//select result set:
$result = pg_exec($pgcx,"select
*
from table_name
order by column_name
");
// Get the number of fields that occur in the result set
$numfields = pg_num_fields($result);
// uncomment below to echo the number of fields found
//echo $numfields." Fields in this table";
// get the number of rows found
$numrows = pg_numrows($result);
// loop through our result set
for($nr=0; $nr<$numrows; $nr++)
{
//loop through each result set and
//assign variable names and values using
//our number of fields as the constraint
//and the name of the fields as var names
for($fn=0; $fn<$numfields; $fn++)
{
//load an array with each field name
$fname[$fn] = pg_field_name($result,$fn);
// create variables with field names and
//assign values from slash stripped result set
$$fname[$fn] =
stripslashes(pg_result($result,$nr,$fname[$fn]));
//uncomment the below to see the variable name - value pairs
// here is where you see the fields and what they are called if you need
//echo "$".$fname[$fn]." = ";
//echo $$fname[$fn];
}
//echo a variable-value pair to test (regular output prints here)
//echo "Field_Name: ".$field_name;
}
Using Copy and \CopyReprinted from http://techdocs.postgresql.org/techdocs/usingcopy.phpBy: Jeff Eckermann
Using PostgreSQL's COPY function effectively
Last updated 17th September 2001
Introduction
PostgreSQL's "COPY" statement is an excellent way to load large
amounts of data quickly into a database.
The main PostgreSQL documentation is the definitive guide,
and I recommend studying it.
The documentation is quite easy to understand, once you
know it :-). If you are like me, it probably looks a
little forbidding at first. I hope the following notes,
based on my (sometimes painful) experiences with COPY,
will help make your experience with this useful utility
more pleasant and rewarding.
Basic basics
Here is the syntax for COPY, from the main docs:
COPY [ BINARY ] table [ WITH OIDS ]
FROM { 'filename' | stdin }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]
COPY [ BINARY ] table [ WITH OIDS ]
TO { 'filename' | stdout }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]
COPY will be run by the PostgreSQL backend (user "postgres").
The backend user requires permissions to read & write to the
data file in order to copy from/to it, and needs to be able
to find it.
Therefore you need to use an absolute pathname. You will also
need to have insert/update or select permission on the table
in order to COPY to or from it.
Assuming these requirements are met, you are ready to happily
populate your database by doing:
COPY bleah FROM 'blah';
from the psql prompt.
TIP: Try creating a test table, inserting some data
like that which you want to import. Then COPY the content
to a file and inspect the results. This will tell you a lot
about what COPY wants.
Problems
COPY is not terribly smart. All it really does is split
each line using the appropriate delimiters, and attempt
to line up each field in your data with each field in
the table, counting from the left. If the parser accepts
your data, in it goes. No parsing is done by COPY itself,
so the data is handed off in just the state you have it.
I find problems usually fall into the following
(somewhat overlapping) categories:
* Data type mismatches
* NULL confusion
* Rogue characters
You might be amazed (or maybe not) at the kinds of things
that DB users are capable of inserting into your data.
Be prepared to do some editing to get your data accepted
by COPY (more on this later). Here are some examples of
the kinds of problems that I have encountered:
* Embedded delimiters
This is all too common, especially with user-created memo
text. COPY expects tabs as delimiters by default, but
you can specify something else with "USING DELIMITERS 'whatever'".
If you have extra delimiter characters, COPY will find
too many fields to fit your table, and.... you can guess
the rest. Most often this will show as a data type mismatch,
as COPY attempts to stuff your text string into a date
field or something similar.
* Backslash characters
I wonder what would possess someone to terminate their
data entry with a backslash? It really happens!
This means trouble, because the following delimiter is
thereby escaped, and no longer recognized as a delimiter.
All the fields get shifted down by one, with the result
that (a) your COPY fails because of a data type mismatch,
or (b) your data is silently accepted in a mangled state.
Either way, not good.
* Non-printable characters
Difficult to catch, because you can't see them. Something
to watch out for especially if your file is in fixed width
format, as this means spaces or tabs have been used to
line up your data. These will be happily accepted by
the parser for text type fields, but not for number or date types.
If you are importing from Windows, be aware that some
Windows-based software will accept spaces into number
or date fields, and these may exist in your file.
Other non-printable characters can be even worse, because
they don't take up space, and are therefore practically
invisible. If your COPY fails for this reason, you can
find them by doing something like this:
grep 'search pattern' datafile | vis -lw
This will show you every non-printable character
in the affected lines.
* Carriage return characters
If your data file was created on, or passed via a Windows
machine, chances are that your lines are terminated by
CR/LF combinations. If the CR is not removed, it will
end up in the final field of your table. If that field
is a number or date data type, your COPY will fail. If
the field is a character data type, your COPY will succeed,
and you will then be scratching your head trying to work
out why comparisons using that field give such strange results.
* NULL confusion
COPY expects NULLs to be represented as "\N" (backslash-N)
by default. You can change this by using
"WITH NULL AS 'something_else'". If you have empty fields
in your data, I strongly recommend using "WITH NULL AS '' ".
Otherwise, COPY will assume empty fields represent empty
strings, and will bomb on the first empty number or date
field. Note that you cannot mix NULL representations!
* Just the wrong data format
Your data must match the format required by the relevant
PostgreSQL data type. See the docs.
For example, if your integer looks like this:
1,203,327
it will not be accepted. The commas will need to be
edited out. PostgreSQL recognizes a variety of date/time
formats. If you are unsure about yours, test a sample.
Fixing Problems
If your data file is large, using an interactive editor
is not a good idea. Most times, your edits will require
some scripting. If you are not already familiar with
scripting in tools like sh, sed, awk or perl, time spent
on learning will be rewarded. You can do a lot with a
little knowledge. Coverage of these methods is beyond
the scope of this little article (you probably know more
than I do anyway!).
Carriage returns
You can avoid these altogether by moving your data files
via FTP, using the "ascii" transfer method. This will
automatically adjust the line endings for you.
You can just delete the carriage returns using a simple
script: " tr -d '\r' < datafile " should work just fine.
I like to use GNU recode, which has lots of other nifty
functionality as well. The command for that is recode /cl
datafile. There are lots of other possible methods,
mainly similar to these. Choose according to your taste.
My terminal emulator (SecureCRT) allows me to copy and
paste quite large amounts of data directly from my Windows
workstation, using "COPY FROM stdin;". It even renders
MS Excel cells into lines of tab delimited text, with
appropriate line endings! If you are using a terminal
emulator, you may want to investigate its capabilities.
Rogue characters
Your choice is to delete them, perhaps with an innocuous
replacement, or to escape them. This requires inserting
a backslash before each offending character. If you
really want to keep your data just the way it came to you
(like I usually do), this is the way to go. Note that
this is still subject to the limits imposed by the
applicable PostgreSQL data type.
Others
Other situations, such as embedded delimiters or incorrect
data formats, require that you find some pattern in your
data which will allow you to identify precisely what
needs to be changed.
i.e. You will need to delete the commas in "1,203,327",
but you probably don't want to delete all of the commas
in the whole line.
Today's popular scripting languages offer good methods
for doing this kind of thing.
Other Bits & Quirks
You can also use the psql client version of COPY, which
is "\copy". The syntax is slightly different:
1. Being a psql command, it is not terminated by a semicolon
2. File paths are relative to your user directory
(and need not be quoted)
3. You cannot specify alternative delimiters.
*this seems to not be the case in the version I am using,
as I successfully output from a table and loaded a new
one using delimiters and \copy
e.g.
\copy table_name to './table_name.txt' using delimiters '|'
\copy table_name from './table_name.txt' using delimiters '|'
-RP
Generally I find \copy works well. The only quirk I have
found is it sometimes fails to return an error message
that otherwise you would get from COPY; and sometimes
will hang in that case as well. Check the result carefully
in this case.
The behaviour of COPY is not symmetrical. If you COPY
data into a table already containing data, the new data
will be appended.
If you COPY TO a file already containing data, the
existing data will be overwritten.
If you are doing repeated tests of your COPY, be careful
you do not duplicate the data in your table.
The quickest way to clean out a table after a test is
"TRUNCATE tablename;".
Default values are not filled in by COPY, as an INSERT
would do. You will need to explicitly supply a value
for every field. If you are using a "serial" type field,
you will need to reserve some numbers by resetting the
sequence value:
SELECT setval ('sequence_name', new_value);
You can then insert the reserved numbers into your data
file, and COPY away.
Indexes on your table will slow down your COPY too, perhaps
by an order of magnitude. For a large data file, you may
be better off dropping your indexes and recreating them
afterwards. A script is an easy way to do this.
Because each COPY runs as a single transaction, an extremely
large data file might lead to performance problems. If in
doubt, just split your file into chunks.
When editing, I often like to do a one-shot, like this:
sed -e 'edit_script' data_file | psql -c " copy mydata
from stdin with null as '' " -d mydb&
* When loading a dump file generated from a different
RDBMS, containing INSERT statements, you may be able to
get it to load with little or no editing.
However, there may be some difficulties with large dump files:
o Each INSERT is a separate transaction, with
BEGIN...COMMIT overhead.
o Some INSERTs may fail, but the rest succeed,
leaving you with problems syncronising your data.
You could get around the first problem by splitting the
file and surrounding the pieces with BEGIN...COMMIT
statements. Or you could try a different approach; edit
the dump file to a delimited text file, and use COPY to
import your data.
If you want to try this, here is a Perl script which
works for me.
Check the man page of psql for more information.
By: Jeff Eckermann
from: http://us2.php.net/manual/en/function.ereg-replace.php
dude at d0gz dot net
01-May-2002 12:47
When moving data from Oracle to Postgres, I was having trouble with values containing nulls from the Oracle SELECTS (which causes Postgres fits during an insert i.e. 'Unterminated quoted string').
This solved the problem:
$output=preg_replace('[\x00]','',$input);
|