SDS:Backups

From Tetherless World Wiki

Jump to: navigation, search

Contents


Plan for backing up escience and tetherless.

Directories to Backup

  • /ext/opt/var
  • /ext/project
  • /ext/opt/tomcat/tomcat/conf
  • /etc/httpd/conf
  • /etc/httpd/conf.d

Requirements

  • Backups should be staggered
    • there should be daily backups going back 2 weeks
    • there should be weekly backups going back 4 months
    • there should be monthly backups going back infinitum
  • Permissions on files and directories should be preserved
  • The backup script(s) should be run on a set schedule (cron)
  • Adding a new directory to the backup listing should require only editing a single configuration file or adding a cron job with the path as a parameter
  • The backup process should be scheduled during an inactive time (3am or so) and complete before 8am
  • Backups must be stored on a separate disk & filesystem from the source. Optionally, backups can be stored on a remote system.
  • Backups should be compressed to save space
  • Once a year the monthly backups of the past year should be saved to DVD, BD, tape, or other long-term archive material and stored in an appropriate location.

Design

Potential file/directory backup mechanisms:

rsync directory/file, compress all
Pros:

  • Does delta/diff updates to out-of-sync copies (fast)
  • Can be configured to retain permissions

Cons:

  • Does not support compression of a file hierarchy, every file will have to be compressed separately
  • The fast delta updates to out-of-sync copies does not provide much benefit unless you overwrite a previous backup
  • Does not provide a mechanism for backing up only deltas, only using deltas to make fast updates.
  • if used with SSH as the transport to sync with a copy on a remote system, the account on the remote system that rsync is logging on as must be able to write the correct permissions.
    • This ~may~ require rsync to login as root on the remote machine (I really really don't think we should do this) to preserve permissions.

tar directory/file, compress
Pros:

  • can be configured to retain permissions
  • a single tarball of many entities can be compressed
  • independent of transmission mechanism to separate disk

Cons:

  • Does not easily provide a mechanism for backing up only deltas
  • Does not follow symbolic links

Errata:

  • When tar encounters a symbolic link it dereferences and archives the target of the link instead of the link itself, This could be viewed as a beneficial for a backup, but the symbolic links may be lost if a backup is used to recover the system content. Also, if multiple symbolic links to the same target are backed up, multiple copies of the target will be dereferenced and backed up.

Potential transport mechanisms:

SSH
Pros:

  • Can send a command down a SSH connection
  • Secure
  • Allows us to make filesystem commands on a non-NFS mounted remote system
  • Works with rsync

Cons:

  • Logging onto a SSH connection as root should be prohibited

NFS mounted filesystem
Pros:

  • Will not require a SSH connection to write to the external filesystem.
  • Managing backup directories with external filesystem as easy as working with local filesystem.
  • Should be faster

Cons:

  • May take a long time before this is setup.

Potential script mechanisms:

Config file for backup targets:
The script that runs the backups should determine which directories and files to backup by reading a configuration file. The config file to use should be passed in as a commandline parameter with the following syntax '-c <path/to/config>'.

The script should use tar instead of rsync to generated backups because compressing tar archives is simpler than compressing the results of an rsync and because we will not be overwriting past backups.

The configuration file should use the following syntax:

  • pound character (#) to denote the beginning of a comment
  • a label followed by some number of spaces or tabs followed by the path to the resources to backup followed by a newline
    • the label is used in naming tar archive and cannot contain a forward slash, tab, or space

An example of a configuration file

# backup for /opt/var
opt-var /ext/opt/var

Script configuration of backup archive destination:

If the backup destination is a NFS mounted directory, the script should take the backup destination root path as an input parameter with the following syntax '-d <path/to/backup/destination>'. I decided to not put this in the config file to simplify config file related logic and so a cron entry can be quickly read to discern where the backups are being saved.


If the backup destination is on a remote system that is being accessed via SSH, the username and host of the SSH connection and the path to the destination root path on the remote system can be passed as a input parameter and the user will have to configure the source and destination systems to use passkeys for no-password ssh login. The script will then be configured to transfer (with scp) the compressed tarballs to the remote system as well as send filesystem commands over ssh.


Using NFS is MUCH preferred over SSH.


Script-based staggered backup hierarchy and maintenance:

The backup script will manage the backup archives in the backup destination root. The backup destination root will contain the directories; daily, weekly, monthly, and latest. For every entry in the config file, a compressed tarball will be created using the saming scheme <label>-<xsd:dateTime>.tar.gz. This file will be moved to the daily directory and a symbolic link in the 'latest' directory with the naming scheme <label>-current.tar.gz will be created that points at it. Note that it is important that all labels be unique so as not to overwrite the 'current' symlink of another backup.


Backup archive maintenance will be run after the new archives have been generated. This logic could be in another script that is called by the backup script (preferable) or contained within the backup script itself. The maintenance logic will ensure that archives from the first day of the month are moved or copied to the monthly directory, archives from the first day of the week are moved or copied to the weekly directory, daily archives past expiration are deleted, and weekly archives past expiration are deleted.

Personal tools