LCSR File Services

Updated September 10, 2003

LCSR provides file space for user files on a NetApp F820 filer for DCIS administrative, research, graduate and undergraduate users as well as the DCIS instructional lab and the BioInformatics lab.

Our goal is to provide reliable file access with an availability of 99.99%. File restores should be accomplished within three hours of being requested when operators are on duty.

The public announcement of the NetApp fileserver describes the filer and backup/restore information.

Monitoring and Usage

New entries in the NetApp log files are summarized hourly and these summaries are read by staff periodically during the course of normal business hours. On a nightly basis, usage is summarized in two places: NetApp usage statistics, which reviews user and snapshot space history for all volumes on the NetApp and lists restores done from NetApp snapshots and the age of those files at the time of restore, and NetApp disk usage statistics, which shows the current number of physical disks in use and available, gives a more detailed review of current volume status, and lists qtrees (separate "filesystems" within volumes), where they live and what they're used for.

Authentication

Access to files on the NetApp is accomplished by two methods.

NFS access (Sun's Unix filesystem style) is controlled on a hostname basis from the NetApp. Users are able to access their files subject to owner, group, and world permissions just as they would on a local filesystem.

CIFS (SMB) access (Microsoft's filesystem protocol) is granted to users who authenticate through LCSR's domain servers. The Windows user is then mapped to a corresponding Unix user for access and protection purposes. Windows users are only granted mount access to qtrees on which they have Unix home directories.

Backup procedures

The NetApp filer uses a filesystem (WAFL) which enables instantaneous snapshots of the state to be saved. Until changes take place on the live filesystem, these snapshots take no additional space on the filer. These snapshots can be navigated in the same fashion as the regular filesystem. Files can be restored from any snapshot in which they exist.

Snapshots are currently scheduled for 0000, 0800, 1200, 1600, and 2000. The non-midnight snapshots (called "hourlies") are kept for about 2 days. Those done at midnight (called "dailies") are kept for two weeks. And every Sunday night-Monday morning, the "daily" snapshot becomes a "weekly" and is kept for 9 weeks (about 2 months). Beyond this point, no restore is possible on the NetApp.

In a standard NetApp configuration, it is possible for the users to navigate these snapshots of the filesystem. It is possible, since snapshots are unwritable copies of the filesystem at a specific point in time, for a file to be preserved publicly accessible when it should really be private. The only remedy for this would be to remove the entire snapshot (eliminating the possibility of restore from that point in time) or hiding snapshots from users. In consultation with the LCSR Users Group, we have decided to restrict viewing of snapshots to the operations staff for the performance of restores.

Preparations for catastrophe

On a monthly basis, all volumes with live user files on them are dumped to tape. These are only to restore volumes in case of a volume or total NetApp failure. These dumps are kept for a period of two months.

File System Risk Assesment