In general, we back up system filesystems on critical servers and designated user filesystems (eg, user home directories and specially allocated filesystems for specific groups). For non-critical servers and desktop machines, we feel that reinstalling the machine is the easiest recovery method in case of disk failure. File space accessible to users on those machines, while good for high-speed local access, is regarded as temporary and is not backed up.
"Incremental" dumps are recycled each week. The "full" dumps are kept for longer. Originally, all "fulls" were kept for 1 month. The first "full" of a month was designated a "monthly." "Monthlies" were kept for 6 months. "Monthlies" done in July and January were designated "semi-annuals" and "annuals" and kept for the life of the tape (typically, several years).
With the advent of the new law regarding public access to government records, we have changed the period for which we keep backups. This law makes us responsible for searching for all copies of files relevant to requests made by the public. In consultation with University attorneys, other departments, and our users group, we have severely limited the number of backups we keep. Our policy is now to keep backups for a maximum of two months.
We do not back up user PCs or Macs.
Users are expected to backup their own data if it's critical.
Recently, several companies have implemented snapshotting filesystems
(allowing for pictures of the filesystem to be frozen in time) on
RAIDed disks (providing among other things, highly reliable
filesystems).
We are now using a NetApp fileserver to provide backed up filespace at
a cost of about $55/GB.
Three years ago, providing disk space on a Sun with backup to tape ran
about $225/GB.
So we're now providing faster file service on more reliable hardware
with quicker file restore turnaround time for about 1/4 the cost.
A regular schedule of snapshots increases the likelihood we will be
able to
restore a user's file
to a useable state.
Snapshots are done approximately every 4 hours.
These are kept for about 2 days.
Snapshots done at midnight are kept for 2 weeks,
except for those done at Sunday-Monday midnight, which are kept for 2
months.
Snapshots allow us to keep a better variety of potential backup copies
of a file for the two months we keep them.
Every month volumes on the NetApp are backed up to tape.
This is not for user file restore.
It is only for emergency restore in case of a RAID failure on a volume
of the NetApp.
These backups do not contain snapshots, so while we
can
restore a volume on the NetApp to its state sometime in the previous
month should a catastrophic failure occur, snapshots on the restored
volume would be lost.
Policy: LCSR maintains CD backup facilities for users of our
faculty computing systems. These backups act as a 'checkpoint of last
resort' for users of our faculty systems.
Procedure: Users who wish it have their home directories are
written onto CD-Rom or Data-DVD (using the appropriate media based on
total home directory size) every two months. Whether a user has this
backup performed is an 'opt-in' mechanism; each user is asked if they
wish to opt-in before each backup (users may permenently opt-in or
-out if they wish.) Backups are either delivered to the individual
user or picked up by them at their direction. Once picked up or
delivered, the storage of these backup media are the responsibility of
the user.
Risks: Risks fall into three categories
Risk: Moderate
Remedies: A checklist of directories is kept by the operator
performing the backups; it is in turn checked by a supervisor; the
operator watches the backups occur, and the backup software checks
itself to see that the backup has occurred.
Risk: Moderate
Remedies: The backup software performs basic checks of backup
integrity; backups are tested against 'live' directories after
the backups are done.
Risk: Low
Remedies: The disks are kept by a supervisor until they are picked
up or delivered (at the user's option); thereafter the security of the
disks is the responsibility of the owner of the files, who has the
most personal interest in the safety of the disks anyway. Disks which
are not given to the owner are destroyed by a 'disk shredder'.
Further, users are polled before the backups are done. Disks are not
made for those who are not interested in having a backup of their
files, thus precluding 'orphan' disks.
Backups by snapshots
Backups to tape have always been a time and resource consuming
endeavor.
As disk sizes grew while costs dropped, tape technology has always
played "catch up."
We have observed that the cost of time and tape hardware to support
disk space has never scaled as easily as the disk sizes increased.
Some sort of of disk to disk backup scheme has therefore always seemed
an attractive alternative.
NetApp filesystems
As of September, 2003, we have about 1.1 TB of useable disk storage on
the NetApp.
Using snapshots for backups instead of backing up to tape reduces the
time spent doing tape backups for DCS while simultaneously reducing
the time needed to restore files when requested by a user.
Backups to CD
Policies and Procedures for CD Backups