New World Backup System
Overview
- The NW (New World) Backup System is built around rsnapshot(1) as its underlying technology
- rsnapshot is a remote filesystem snapshot utility that uses rsync(1);
- rsync is a fast and versatile, remote (and local) file and directory copying tool that
- Handles local system snapshots directly, and
- Handles remote system snapshots using ssh(1).
- The CSE backup system is run on the New World AWS machine: nw-syd-backup1.
- The scripts, logs, and archives associated with NW backups are generally found in the directory
nw-syd-backup1:/export/nw-syd-backup1/1/backups/
, which will be referred to as~backups/
Operation
- A separate rsnapshot archive is kept of each CSE directory belonging to every CSE user.
- These archives are accessible via the directory:
~backups/users/$LASTTWO/$user[.n]/
- where:
- $user is the name of the CSE user whose directory is stored in the archive.
- $LASTTWO is the last two characters of the user's CSE username,
- [.n] is a unique numeric suffix assigned to each of the user's different CSE directories (if the user has more than one).
- The copies (or snapshots) of the user's directory that were made N days ago is found in the user's archive directory
$user[.n]/
, under- the directories named
daily.N/
. There are a maximum of 30 daily snapshots of the user's directory stored in each archive, one for each day since the snapshot it contains was made. - These daily.N/ directories are renamed (ie: shifted up) each day that they age, until finally the copy made 30 days ago (named daily.30/) is deleted.
- the directories named
- The path:
basename/export/hostname/partition/basename/
is inserted by rsnapshot between the directory daily.N/ and the actual snapshot of the user's directory, and is used to uniquely identify the archive's origin.- basename will usually be the same as the username.
- The file $user[.n]/.bu_source is also used to identify the archive's origin.
- The same access (ie: ownership and permissions) are assigned to each user's backup archive as are assigned to their original CSE directory.
- Once a user's CSE account has expired, and/or the user's CSE directory has been removed from its host server, the user's archive of their directory will also be moved aside.
- However, the daily.N/ directories will continue to be shifted up daily until they have all been removed.
- If the user's CSE directory is restored to the CSE host sever before all 'daily.N/' dirs have been removed, then the remaining daily.[N..M]/ directories will be renamed daily[0..M-N], and made available to the user once more.
Locations
- CSE's backup server
nw-syd-backup1@cse.unsw.edu.au
- ~backups/users/$LASTTWO/$user[.n]/
- This is the location of $user's rsnapshot archives, such that
- $user is the name of the CSE user whose directory is stored in the archive.
- $LASTTWO is the last two characters of the user's CSE username,
- If the user has more than one CSE directory being backed up, then [.n] is a unique numeric suffix assigned to all but one of the user's CSE directories.
- The archive without the numeric suffix should always be the archive that stores the user's main home directory.
- ~backups/users/$LASTTWO/$user[.n]/.bu_source
- identifies the source of the specific user's directory stored in that rsnapshot archive.
- ~backups/users/$LASTTWO/.deleted/$user[.n]/
- If $user's source directory, as identified by the file: ./$user[.n]/.bu_source, no longer exists on the source filesystem, then the user's archive is moved into this .deleted/ directory.
- ~backups/disks/$mountpoint/$LASTTWO/\$user[.n]/
- The actual storage location of the backup archives.
- There may be many different $mountpoints on which different disks may be mounted.
- The directory: ~backups/users/$LASTTWO/$user[.n]/ is actually a symlink to the actual directory in ~backups/disks/$mountpoint/$LASTTWO/$user[.n] that has been allocated to this user's backup archive.
- ~backups/var/
- The location of various files containing (user/dir) data.
- These files group (user/dir) data into different classifications which are used to determine which (user/dir)s are to be snapshotted, and how their archives are to be managed.
- ~backups/lib/
- The location of most of the configuration files used by the NW backup system.
- ~backups/bin/:The location of most of the scripts used by the backup system, including the main backup script itself (described in the next section).
Backup script
The main backup script is: ~backups/bin/backup
This script contains most of the commands used by the backup system. Many of the commands can be run individually by passing them as arguments to backup, along with their desired options.
The backup -h option produces help documentation describing backup and its constituent commands. This documentation is also duplicated below, with further details where this might be helpful.
- Usage
backup [-h] [-l log] [-n] [-q] command [commandargs]
- Function
- This script deals with CSE rsnapshot backups made for any CSE user who owns a home, or other directory, that is hosted on a CSE file server.
- This script should usually be run on the CSE backup server: nw-syd-backup1.
- Options
-
- -h [ command | topic ]
- Print help for the specific command or topic passed. If no command or topic is passed, print this general help and exit.
- Topics and commands are summarised below.
- -l logfile
- Log messages to logfile (Default: ~backups/log/)
- -n
- Do not make any changes - just report what would be done.
- -q
- Do not reproduce messages on STDERR.
- command [command_args]
- Run the backup command with its optional command arguments.
- Backup commands are summarised below.
- Full descriptions of commands and their args may be produced by running:
backups -h command
and are also included in what follows, under their own subsections.
- Topics
-
- general
- Usage: backups [-h] [-l log] [-n] [-q] command [commandargs]
- locations
- Locations used by the CSE Backups System
- overview
- Overview of CSE Backup System
- Commands
-
- diff
- User related diff.
- fixprimary
- Fix primary home archive names if necessary.
- mkspec
- Prepare backup archive directories and specfiles for users.
- movearch
- Copy backup archives to another backup filesystem.
- movesource
- Change the recorded source directory for a user's backup archive.
- purge
- Shifts and/or deletes archives of expired users.
- rename
- Renames archive(s) of a CSE user whose username has changed.
- report
- Summarise logs to report on the last backups.
- resurrect
- Resurrect a user's inactive backup archive from deleted archives.
- run
- Create and/or update CSE New World user backup archives.
backup commands
run
- Usage
- backup [-genopts] run [-m MAXSPEC] [-N[123]] [-n] [-p MAXPROC] [-s SCRIPT] [-u user]
- Function
- Create and/or update CSE New World user backup archives.
- Create lists of all CSE (user/directories), classify and fix primary homes if necessary
- (files in $BU_VAR/)
- (See: fixprimary command below)
- Create specfiles
- Ensure a backup archive exists for each (user/dir).
- (archives in $BU_USERS/)
- Create rsnapshot specfiles for each (user/dir).
- (specfiles in $BU_TMP/)
- Ensure a backup archive exists for each (user/dir).
- Use xargs to run consecutive invokations of SCRIPT, passing a different specfile to each process each time.
- (See '-s SCRIPT' option below)
- Create lists of all CSE (user/directories), classify and fix primary homes if necessary
- Options
-
- -D
- Run an xargs process for each physical disk storing archives.
- This attempts to evenly distribute simultaneous disk activity across all physical disks.
- Default: Only run one xargs process, and ignore details of physical storage disks.
- -m MAXSPEC
- Each xargs process will call SCRIPT no more than MAXSPEC times.
- Default: Call SCRIPT until all specfiles are processed.
- -N1
- Do NOT (re)create user dir lists. (ie: Do not run step 1)
- -N2
- Do NOT (re)create rsnapshot dirs or specfiles. (ie: Do not run steps 2a, 2b)
- -N3
- Do NOT use xargs to call SCRIPT at all. (ie: Do not run step 3).
- -n
- Equivalent to -N3 or '-m 0'.
- -p
- MAXPROC
- Each xargs runs MAXPROC simultaneous process sequences of SCRIPT.
- (Default: with '-D': $BU_MAX_INSTANCES_D; without '-D': $BU_MAX_INSTANCES)
- -s SCRIPT
- xargs consecutively invokes this SCRIPT (in step 3),
- passing SCRIPT another specfile each time.
- Default SCRIPT - $CALL_RSNAPSHOT:
- Passes the specfile to rsnapshot with the required options.
- Removes the specfile after rsnapshot has run.
- Note: SCRIPT may be any program. It need not run rsnapshot, nor must it do anything with the specfile passed.
- -u user
- Create the specfiles for just this user (unless '-N2'),
- Run SCRIPT passing just user's specfiles (unless '-n')
- (Default: Create and run specfiles of all CSE users).
mkspec
- Usage
- backup [-genopts] mkspec [-D] [-nfs] [file]
- Function
- Prepare backup archive directories and specfiles for users.
- Read the input file of format: 'directory user host', and for each (user/dir) for which host is defined:
- Find or create an rsnapshot archive for (user/dir): ($BU_USERS/\$LASTTWO/\$user[.n]/)
- Create individualised rsnapshot specfiles (By default in $BU_TMP/)
- If host is not defined, but an rsnapshot archive has been found for this (user/dir), then move the archive dir to $BU_USERS/\$LASTTWO/$DELETED_DIR/\$user[.n].
- If host is not defined, but no rsnapshot archive is found, then silently ignore the (user/dir).
- Read the input file of format: 'directory user host', and for each (user/dir) for which host is defined:
This is the function called by the 'run' command to perform its steps (2a) and (2b). (See '$ME -h run')
Options:
-D Create the specfiles in $BU_TMP/disk.N/ depending
on where the archive is stored.
-nfs The rsnapshot specfile specified that the directory
should be accessed over NFS, rather than via an SSH connection to host (which is the default).
file This is the file of format: 'directory user [host]'.
If host is present then the directory exists on host. (Default file: $BU_VAR/$D_SSH)
report
Usage:
$ME [-genopts] report [-d date]
Function:
Summarise logs to report on the last backups.
Options:
-d date
Report on all backups run on this date. Date is specified as yyyy.mm.dd
*NOT YET IMPLIMENTED*
diff
Usage:
$ME [-genopts] diff user
Function:
User related diff.
*NOT YET IMPLIMENTED*
"
[movearch]="Usage: $ME movearch (archname tofs | -f file)
Function:
Copy backup archives to another backup filesystem.
The backup archive archname is copied from: $BU_USERS/\$LASTTWO/\$archname
which is a soft link to its actual location at: \$FROMFS/\$LASTTWO/\$archname
to: \$tofs/\$LASTTWO/\$archname. If the copy is successfull, the soft link is adjusted to the new location.
Options:
archname The name of the specific archive to be moved This is usually of the form: 'username[.[0-9]]' tofs Full pathname of the destination filesystem
(eg: $BU_BACKUP/disks/3)
-f file Read file containing lines: 'archname tofs',
and copy and move each archive to tofs. Note: file of '-' will cause the script to read from STDIN.
Default: (no options or args) Read from STDIN. (ie: '-f -' )
Note:
This command does not change the recorded source of the archive. If you have moved the source of the archive (ie: the location of the user's home dir) from one home directory server/fs to another, then use the movesource command. See: '$ME -h movesource'
movesource
Usage:
$ME movesource user olddir newdir
Function:
Change the recorded source directory for a user's backup archive.
Details:
CSE Backup archives record their source directory in two places: (1) $BU_USERS/\$LASTTWO/\$username[.n]/$BU_SOURCEDIR
This record is maintained by '$ME'.
(2) In each retention level archive pathname: $BU_USERS/\$LASTTWO/\$username[.n]/daily.N/\$username/\$source_dir/ This source_dir pathname is maintained by 'rsnapshot'.
Use:
Run this command when a user's home (or other directory) has had its pathname changed in any way, either because: (a) Some directories in the pathname were renamed (but the contents were not physically relocated), or (b) The directory and its contents were relocated/copied from one disk server's file system to another, resulting in a pathname change.
Example:
$ME movesource zain /import/kamen/1/zain /import/glass/A/zain
Note:
This command only changes the archive source records in (1) and (2) above. (a) or (b) are expected to be done separately elsewhere.
If \$username has also changed, and not just the source directory's pathname, then you should also use the 'rename' command. (See: '$ME -h rename')
rename
Usage:
$ME rename oldusername newusername
Function:
Renames archive(s) of a CSE user whose username has changed.
Details:
In general, backup archives belonging to username have the following access pathname:
$BU_USERS/\$LASTTWO/\$username[.n]/daily.N/\$username/\$source_pathname/
This command changes all occurences of \$username in such access pathnames from 'oldusername' to 'newusername', also changing \$LASTTWO where necessary.
Note:
This command will not change any further occurences of \$username within \$source_pathname. If \$source_pathname has changed in any way, then you should also use the 'movesource' command. (See: '$ME -h movesource').
resurrect
Usage:
$ME resurrect user [sourcedir]
Function:
Resurrect a user's inactive backup archive from deleted archives. If sourcedir is passed, only resurrect the user's inactive backup archive coming from sourcedir, otherwise resurrect all inactive backup archives belonging to user.
Details:
Active CSE users have backup archives kept in: (a) $BU_USERS/\$LASTTWO/username[.n]/ Expired CSE users have their backup archives made inactive and moved to: (b) $BU_USERS/\$LASTTWO/$DELETED_DIR/username[.n]/
If an expired CSE user has their account reactivated, then this command reactivates their backup archive by moving it from (b) to (a).
Note:
This command only resurrects the backup archive (if it exists), so that the user's home directory(s) may be backed up once more. It does not restore the user's original home directory from the backup archive. Run the command 'restore' to do this.
The source of the resurrected backup archives are assumed to stay unchanged. If the user's original directory(s) is restored into different file systems/sourcedirs, in addition to resurrecting the user's original archive, you may need to use the 'movesource' and/or 'rename' command.
See Also:
'$ME -h purge' - for what happens to deleted archives. '$ME -h movesource' - for moving archive sources. '$ME -h rename' - for changing user names. '$ME -h restore' - for restoring user directory from archives.
purge
Usage:
$ME [-genopts] purge [-u user] [-m MAX] [-n]
Function:
Shifts and/or deletes archives of expired users. This command: a) Shifts all inactive/deleted archive directories stored in:
$BU_USERS/\$LASTTWO/$DELETED_DIR/
b) Removes inactive/deleted archives with no retention directories left.
Options:
-u user
Just shift and/or purge rsnapshot archives belonging to this user. (Default: All user archives in $DELETED_DIR/ directories)
-m MAX
Only shift and/or purge MAX users.
-n Do not actually shift or remove any rsnapshot archives.
Just list deleted archives on STDOUT.
-v verbose listing - include disk usage
Background:
When CSE users are expired, their backup archives are moved from
$BU_USERS/\$LASTTWO/user[.n]/
to $BU_USERS/\$LASTTWO/$DELETED_DIR/user[.n]/
This command shifts the retention directories in these expired user's backup archives and removes them when they eventually become empty.
fixprimary
Usage:
$ME fixprimary
Function:
Fix primary home archive names if necessary. For every CSE user with a CSE home and a CSE archive, ensure the user's primary home directory is stored in their primary user archive (ie: not stored in an archive with a '.N' suffix).
This command uses renamearch to rename archives if necessary.