In search of a backup strategy.

Copyright (c) Oriole Corporation, 1999

Backups are probably the main concern of production DBAs; a concern acknowledged by Oracle Corp, since Oracle8 features a 'Recovery Manager' (RMAN), which certainly boasts very interesting characteristics (and a less than obvious syntax, although a marked improvement on the IBM Job Control Language (JCL) which seems to have inspired it).

However, choosing the right strategy is always a difficult task and, unfortunately, often it's only after having suffered a loss that you realize you had settled on the wrong one. We shan't try to provide definitive answers hereafter - just possible answers but our goal will have been fulfilled if we cause you to ask yourselves the right questions. What this paper will try to cover is how to compare the costs of the different possible strategies, protection against human failure and recovery speed.
What follows, except where explicitly stated, is valid for both Oracle7 and Oracle8.

A backup strategy is exactly the same as an insurance policy.
First, you need to define what risks you want to be protected against. Are you ready to lose some of your data? What do you fear most? Hardware failure? Fire? Any service interruption? Don't forget, too, that many a major problem doesn't stem from hardware but from human failure - such as an upload of the wrong file or an update program wrongly applied twice which will not technically corrupt the database but will nonetheless make a mess of your data.
How fast do you want to be up and running again after a major problem?
All this will define a policy; as with any insurance policy, you will have to pay a premium - possible service disruption induced by backups, additional hardware, operational complexity ...
The main problem is to analyse risks and to balance protection against the price to be paid for something which may never occur - but which may cost you dearly if it does.
You must also know that, just as the premium is likely to be much higher for a Rolls Royce car than for an old pickup truck, the bigger your database and the higher its availability, the more it will cost you.

Physical against logical backup.

You have two main ways to backup a database, which are not mutually exclusive:
There is, however, something which is of great importance : recovery speed. With a physical backup, recovery speed is easy to predict : with old fashioned physical methods - we shall discuss more innovative methods below - it will be of the same order of magnitude as the time taken to backup the database (it may not be exactly the same speed if the source medium is different from the target medium, since reading is faster than writing and performances may vary). It is quite a different matter with a logical backup : with the most commonly used tool, exp/imp, you may expect reloading times at least five times as long as downloading time. The reason is that since indexes are not saved as such, but only as CREATE INDEX SQL statements, rebuilding indexes usually takes an awfully long time. This is much less true with SQL*Loader, which in direct mode can reload roughly as fast as PDQOUT downloads (indexes can be rebuilt as the table is being reloaded, instead of being rebuilt afterwards). A positive benefit, which must be taken into account, is that, by rebuilding your database, you will reorganize it, which is always benefitial to performance. On the negative side you are also introducing an additional degree of uncertainty - something may (and if it may, experience tells us it most probably will) go wrong somewhere. With a brute force copy, you will have relatively few nasty surprises when restoring (unless you have a corrupt backup). With anything which involves reconstruction and reloading, you may run into problems you can usually overcome when you have some experience but which may bring undesirable delays (and a lot of stress for the DBA).

Recovery to the last transaction or not?

When you ask users what they are prepared to lose, they will probably cry "Nothing!"; now, negotiation may bring them to concede that if in the case of a major problem the data is brought back to the state where it was say less than 24 hours ago, although they'll have to re-enter a lot, it can be done. There are critical cases however when this isn't an option and we absolutely have to get back to the state immediately before the system crash - or before this unfortunate upload of a corrupted file. This can be done under certain conditions with physical backups.
In this case your database must operate in ARCHIVELOG mode. In this mode, the redo log files which record synchronously, the changes which have been applied to the database (as opposed to the data files to which changes are applied asynchronously during checkpoints) are archived before being rewritten.
When recovering, this allows you first to restore a backup of your files, then to re-apply all the log files, which have been generated up to the crash or a specific point in time. This means of course, first that you must take a backup from time to time, and second that you must archive and manage everything which has been generated in between backups - and be careful as Oracle can generate LOTS of archive log files; don't forget too, that reapplying them will take some additional time. From Oracle 8 on, applying the archived redo log files can go on as the database is opened; it would, of course, be naive to believe it could be transparent.

Full backup against incremental backup.

A backup can be full (a snapshot, either logical or physical, of the database at a given time) or incremental (only what has changed since the last backup was saved). Oracle has long allowed an incremental export but this is in fact almost useless, as any table which has been modified since the last export is FULLY exported - as a result, you export almost everything except the small reference tables which are rarely updated. The ARCHIVELOG mode is in a way, a kind of incremental backup, but indeed a not very pure one since it mixes physical backup and transaction log. Since Oracle 8.0, incremental physical backup is also available and this is a feature of real, practical interest, as only database blocks which have been modified will actually be backed up. One must keep in mind that the big advantage of an incremental backup is in the speed of daily routine backup operations. However, recovery operations will take longer than recovering a full backup, since you will have to recover first, a full 'reference' backup and then all the small incremental backup files which you will have taken since. Once again, adding complexity adds risk. It also means that if you want to keep a manageable number of files, you must still take a full backup at regular intervals (say once a week with a daily incremental backup).

Cold backup against hot backup.

A backup may be 'cold' which means that the database is shut down, or 'hot' i.e. with a database up and running - and users at work. Note that this is a notion which applies mainly to physical backups; with logical backups, the database HAS to be up for data extraction; however, if bouncing it is an option, it can be made accessible to ordinary users, or solely to DBAs. If ordinary users are working, caution must be taken to extract consistent data.

Cold backups.

As indicated above, a cold backup implies shutting the database down. The traditional way of doing this with small to 'medium' databases (let's say up to 50Gb), is to shut the database down cleanly, then run a script which saves all the files associated with the database, then to start up the database again. The smart way is to generate the script before the shutdown by running a query on the Oracle data dictionary, so as to be certain not to forget the last file which was added in a rush because of rollback segment problems with that terrible batch update. If the night is yours for maintenance operations and if your database is just a few gigabytes big, this is an excellent solution (which you can pair with a full database export, just in case). There may come a time however, when the database is too big for a file copy, even from disk to disk, to complete in the time available for maintenance.
Do you then have to go for hot backups? Well, perhaps not. Some operating systems, such as the Unix available on the big Compaq (ex-DEC) servers provide an interesting feature known as cloning. Oracle provides read consistency when you select from a table while updates are going on : blocks which have been modified since the select started to run are skipped and what they used to contain, is read from rollback segments instead. Filesystem cloning provides a similar functionality : cloning is a very fast operation which takes a virtual copy, but disk blocks are copied to the clone only when they are requested or before being modified after the cloning. You can shut your database down, clone it, restart your database and quietly backup the clone. Another solution is the use of mirrored disks : if your database files are mirrored, you can shut your database down, break the mirror, restart your database, and quietly copy what is in effect a snapshot. Once the copy is done, you just have to resilver the mirror. Of course it means that for the full duration of the copy you will have no mirror, which may not be desirable; but if you are rich enough to afford three sets of disks and to triplicate your data, it would be perfectly safe to use one for backup operations. Don't forget, however, that while you can endeavour to minimize the downtime required by the backup, nothing can be done about the time required for the recovery - you will have to copy the files back into place if you have a major problem, and for all that time the database will be unavailable. Unless, of course, you use removable mirrored disks and if recovery is just a matter of plugging in another set of disks - once again, you have to balance the risk and the cost.

Hot backups.

Hot backups mean taking a backup of the database while it is up and running with transactions going on. Quite obviously, it's sensible not to do the backup at the busiest time of the day. You can't take hot backups without taking any precautions (or to be more accurate, you can but it's only when you try to recover your database that you'll find that the backup is unusable). If you are doing a logical backup, and using exp, there is a CONSISTENT keyword which tells Oracle to keep a consistent view (by the intensive use of rollback segments) of the data being exported. This can lead to rollback segment problems, specifically the (in)famous 'Snapshot too old' message (see the paper on rollback segments on the www.oriolecorp.com site); moreover this works only for the tables in the exp session; if you want to use exp cleverly by running several sessions in parallel, it won't work. Now, when we refer to hot backups in an Oracle context, we chiefly mean hot PHYSICAL backups; and here there is a number of rules to follow.

In this case recovery will necessarily imply reapplying a number of redo log files after the data files have been restored. A free script to run hot backups is available on the www.oriolecorp.com web site.

Replication.

When the crucial point is to be able to resume operations quickly, replication can be a good solution. However, one has to beware not to use asymmetrical replication : if replication is asymmetrical, the databases are not structurally identical, which may be a problem (especially on return to normal operations). Oracle symmetrical replication is something of a monster to set in place, and has never been designed to be used for backing up a database. However, Oracle provides (from Oracle 7.3) a standby database facility, which is a database which is not opened (although from Oracle 8.1 it can be opened in read-only mode), on which a flow of redo log (journal) files coming from another database are applied as they are generated. It can be an excellent solution to be able to switch rapidly to an operating database with a minimum loss of transactions (you will lose the contents of the redo log files you were unable to apply to the standby database - there is always a small lag). Beware however, that if your data is corrupted by a program, the replicated database will be corrupted too; in that case, the only way to get back to a 'clean' state would be a point-in-time recovery winding the clock back to before the program started.

What about generic backup software?

A number of products (among them, but not exclusively, the Legato products which are shipped with Oracle) ease backup administration (such products are required by the Oracle8 Recovery Manager for their tape-handling capabilities). One has to be extremely cautious when using such products, which can be very helpful but may not tell you whether you are using them correctly or incorrectly. Given the importance of Oracle in the market-place, you can be sure that most of them offer some provision for doing things cleanly; but double-check that it is actually done so at your site. As mentioned above, if you take a careless backup of a database which is up and running, it will be totally unusable, and you may not notice it before you actually need to recover your database. If you have any doubt (and enough disk space) what you can do is to backup your database to some other disk, with all the necessary care, and then let the backup software backup your backup rather than the database itself - it will be usable. Moreover, it will allow you to keep the last backup online, which may help you to recover faster.

And a last word ...

Try recovery once, just to check that it works ...

 
previous All you ever wanted to know about exp/imp SQL*Net primer. next page