Download PDQOUT_*.tar.Z

 

PDQOUT

Oracle, SQL*Net, SQL*Plus and SQL*Loader are registered trademarks of Oracle Corporation. All other products or company names are used for identification purposes only, and may be trademarks of their respective owners.

Purpose

PDQOUT is a utility which complements SQL*Loader.

SQL*Loader is a very efficient program, able to load into Oracle from a flat file at lightning speed (especially in DIRECT mode). Unfortunately, Oracle has never provided any utility to do the reverse, i.e. to dump the contents of a table into a flat file (apart from spooling under SQL*Plus, which is highly inefficient); it obviously does not help when you want to transfer data from Oracle to a different data management system.

Unfortunately, even when working in a purely Oracle environment, unloading and reloading data is a fairly common operation :

PDQOUT allows you to dump a view to a file, which quite often makes additional processing unnecessary when reloading.

Most of these operations usually need to be done within a very tight timeframe: reorganizations can only take place when all production activity has stopped, and should take place as fast as possible. The same is often true of daily datawarehouse feed.

Until now, most DBA users have had to use the proprietary .dmp format to export the data using the exp utility and to reimport it using imp. While the speed of exp has always been more or less satisfactory, imp is very slow, to the point that when tables grow pretty big it becomes virtually impossible to reload them in the time alloted, especially with very large databases (VLDB).

The PDQOUT/sqlldr combination provides a very efficient way to perform these operations, in only a fraction of the time required by exp/imp.

Installing PDQOUT on your machine

What you download from the Oriole Corporation web-site is a compressed tar file, named

PDQOUT_for_<Operating System>.tar.Z

you first have to uncompress it on your machine :

uncompress PDQOUT_for_<Operating System>.tar.Z

then extract the files from the archive (it will create a PDQOUT directory under the current directory) :

tar xvf PDQOUT_for_<Operating System>.tar

You must then change directory to PDQOUT where you must have 6 files :

READ.ME

pdqout.o

pdqoutid.o

libpdqout.a

mkpdqout

pdqout.mk

These files are all that's required to run a time limited free trial version of PDQOUTor, once you have bought the license from Oriole Corporation, the unrestricted product.

PDQOUT is protected by a software key (a string of digits and letters) which will be forwarded to you on request for the free trial, or on receipt of payment for the unrestricted product.

As the key depends on the machine on which you will run PDQOUT, you must provide Oriole Corporation with two things:

Once you have received your software key, you just have to set the environment variable PDQOUT_KEY to this value, by a command such as :

PDQOUT_KEY=<software key value>; export PDQOUT_KEY

(Bourne shell)

or

setenv PDQOUT_KEY <software key value>

(C shell)

This is best done in a command file which is run when you log on, such as .profile, .login or .cshrc, depending on your environment.

Once the environment variable is set, you can enjoy all the functionality of pdqout.

PDQOUT operating modes

Basically, PDQOUT starts by writing a control file, suitable for later use by SQL*Loader. Then it writes the data itself, either to the very same control file, or to a separate data file, depending on options. If the data file is distinct from the control file, the program forks a special writer process which works in background.

Running PDQOUT

Important : PDQOUT is a utility which queries a number of dictionary tables to be able to collect all the information it needs. Therefore you must provide PDQOUT with the Oracle username/password of an account which has been granted the select any table privilege with Oracle7 or the Oracle8 role which allows querying of the data dictionary (typically, a DBA account).

You run PDQOUT by entering at the operating system prompt :

pdqout <username/password> table=[<ownername>.]tablename

You can control it by specifying a number of parameters, in the very same way as exp or imp. Typing

pdqout help=y

lists all those parameters, which are detailed hereafter.

PDQOUT parameters

As PDQOUT may (optionally) generate up to two files, the PDQOUT parameters can be classified in three categories :

Note that the latest category is more a matter of convenience than the two others; the control file can always be edited and changed before running SQL*Loader,

In the following list, the default value (if there is any) is given between parentheses.

Parameters which define how PDQOUT runs

USERID username/password

As specified above, the username MUST be an Oracle DBA account. USERID= is optional if username/password comes as the first parameter.

PARFILE parameter filename

HELP display help screen (N)

SILENT silent mode (N)

In silent mode, copyright notice, etc. are no longer displayed. Note that all this information is written to the standard error and does not interfere with the data when no data file or control file is specified.

BUFFER data buffer size (655360)

PDQOUT gets the data from Oracle by fetching several rows at once (array fetch), the number of rows being the buffer size divided by the (maximum) row length. If you have a huge memory, you can increase this value to decrease the number of fetches.

SPAWN fork if data file (Y)

If PDQOUT is asked to generate a control file and a datafile, the default behaviour is to fork a separate process to write the datafile (the initial process writes the control file); this two-process architecture has been designed to return quickly to the prompt, and to ease piping into SQL*Loader (see below). However, the default behaviour can be disabled by specifying SPAWN=N if, for instance, you want to start a very large number of PDQOUT processes in parallel, in which case the same process writes both files.

TABLE table or view name

The table or view name can optionally be preceded by an owner name. This is the only mandatory parameter with username/password.

Parameters which specify how to write the datafile

FILE output data file (stdout)

Beware : if CONTROL= (see below) and FILE= is NOT specified, the data will not be written to the standard output but to the control file after a BEGINDATA line.

OPTIMIZE optimize datafile size (Y)

By default, PDQOUT tries to generate as compact a datafile as possible (the datafile size has a big impact on SQL*Loader performance); for instance, NULL columns at the end of a record will not be written at all (no succession of delimiters), and no separator is written after mandatory, fixed-length columns. This is done of course at the detriment of speed, as it requires more processing, but has been retained at the default behaviour as the big issue is, usually, reloading speed. You can specify OPTIMIZE=N if you want (slightly) faster unloading at the expense of a (slightly) bigger datafile.

DELIM field delimiter (X'9')

The default delimiter is a tab; this can be changed at will if your text data contains tabs (Hint : the backspace character (X'8') is a good candidate, as not very likely to be found in text data)

RECSEP record separator (X'A')

The default record separator is a carriage return; note that in any case, you will have a carriage return between records. However, your data (text) may already contain some carriage return characters, which may make things confusing. In this case, you can specify an additional record separator to help SQL*Loader understand where your records start and end. If you keep the default value, the records will be written as :record1record2recordn

If you specify, say RECSEP='$'

record1$

record2$

beginning of record3

end of record3$

etc.

SORT sort the data (N)

By default, the data is downloaded as it comes; however, to speed up reloading, you can ask for an (Oracle) sort. PDQOUT will identify automatically the biggest index and sort the data in the index order (the control file will be updated to reflect it). Note that there is no way to specify a peculiar index. As this option puts a heavy strain on the database, its use is not warmly recommended (note that you always have the option to sort the datafile using the operating system sort).

DATEFMT date format (RRDDDSSSSS)

The default format has been chosen to be as compact as possible; for all dates between 1950 and 2050, the default format loses no information at all, in a small number of bytes. It can be changed either to be more compact (if you are not interested in the time of day, it can be reduced to RRDDD), or to use four-digit years if your table contains dates outside the 1950 to 2050 scope, or to reload the data in a non-Oracle system which requires special date formats.

Parameters which specify the contents of the control file

CONTROL control file (stdout)

If only the control file name is specified, the data will be written to the same file; the reverse is not true (if only the datafile is specified, the control file information will be written to the standard output and will have to be redirected).

The following parameters can be changed later by editing the control file :

INTO destination table

By default, the name of the INTO table in the control file is the same as the source file.

DIRECT direct load (Y)

UNRECOV unrecoverable (Y)

Both DIRECT and UNRECOV refer to SQL*Loader options; note that UNRECOV=Y is ignored if DIRECT=N.

TRUNC load truncate (N)

By default, SQL*Loader is asked to perform a LOAD INSERT, ie it will fail if the table already contains rows.

Examples

pdqout system/manager table=scott.emp > emp.ctl

pdqout system/manager table=scott.emp control=emp.ctl

are strictly equivalent and generate a contol file named emp.ctl which also contains the data inline.

pdqout system/manager table=scott.emp control=emp.ctl data=emp.dat datefmt=MMDDYYYY

generates a datafile and a control file and changes the output format for dates.

Piping data from PDQOUT to SQL*Loader

Quite often data is to be moved from a database on a machine to another database on the same machine (for instance, from the production release n database to the test release n+1 database). By using Unix pipes, you can start loading the target database while unloading is still going on.

You first have to create a named pipe using the mkfifo command :

mkfifo mypipe

Then you start pdqout by specifying the named pipe has the datafile :

pdqout system/manager table=BIG_TABLE control=load.ctl data=mypipe

Note that you do not have to start it in background as you regain control as soon as the load.ctl control file is written.

Then you can change your ORACLE_SID, ORACLE_HOME, etc. (or use a TNS alias, as in the following example) and start SQL*Loader :

sqlldr system/manager@testdb control=load.ctl

and there it goes ...