New World Backup System

From techdocs
Revision as of 10:28, 7 February 2024 by Zain (talk | contribs) (→‎Scripts)
Jump to navigation Jump to search

Overview

  • The NW (New World) Backup System is built around rsnapshot(1) as its underlying technology
    • rsnapshot is a remote filesystem snapshot utility that uses rsync(1);
    • rsync is a fast and versatile, remote (and local) file and directory copying tool that
  • Handles local system snapshots directly, and
  • Handles remote system snapshots using ssh(1).
  • The CSE backup system is run on the New World AWS machine: nw-syd-backup1.
  • The scripts, logs, and archives associated with NW backups are generally found in the directory nw-syd-backup1:/export/nw-syd-backup1/1/backups/, which will be referred to as ~backups/

Operation

  1. A separate rsnapshot archive is kept of each CSE directory belonging to every CSE user.
    These archives are accessible via the directory:
    ~backups/users/$LASTTWO/$user[.n]/
    where:
    • $LASTTWO is the last two characters of the user's CSE username,
    • $user is the name of the CSE user whose directory is stored in the archive.
    • [.n] is a unique numeric suffix assigned to each of the user's different CSE directories (if the user has more than one).
  2. The copy (or snapshot) of the user's directory that was made N days ago, is found in the user's archive directory $user[.n]/, under the directory named: daily.N/.
    • There are a maximum of 30 daily snapshots of the user's directory stored in each archive, one for each day since the snapshot it contains was made.
    • These daily.N/ directories are renamed (ie: shifted up) each day that they age, until finally the copy made 30 days ago (named daily.30/) is deleted.
  3. The path: basename/export/hostname/partition/basename/ is always inserted by rsnapshot between each directory daily.N/ and the actual snapshot/copy of the user's directory.
    • There does not seem to be a way of avoiding this pathname being inserted by rsnapshot given the particular organisation of user archives we use in the NW backups.
    • basename will usually be the same as the username.
    • Although this pathname may be used to uniquely identify the archive's origin, the file $user[.n]/.bu_source is more conveniently used to do this.
  4. The same access (ie: ownership and permissions) is assigned to each file and directory in the user's backup archive as were assigned to each file and directory in their original CSE (home) directory.
    Note however:
    • The filesystems storing the CSE archives are mounted read only outside of the backup server,
    • This means that no user can change the contents or access permissions of any file or directory in their archive, even if they seem to have permissions to do so.
    • This means that it may be possible for users to not be allowed access to files or directories in their archive, if they were originally copied with the wrong access permissions.
    In this case, the users will need to get the help of the system staff (root) to access these files/directories.
  5. Once a user's CSE account has expired, and/or the user's CSE directory has been removed from its host server, the user's archive of their directory will also be moved aside
    from ~backups/users/$LASTTWO/$user[.n]/
    to ~backups/users/$LASTTWO/.deleted/$user[.n]/.
    However:
    • The daily.N/ directories will continue to be shifted up daily until they have all been removed.
    • If the user's CSE directory is restored to the CSE host sever before all 'daily.N/' dirs have been removed, then the remaining daily.[N..M]/ directories will be renamed daily[0..M-N], and made available to the user once more.

Locations

CSE's backup server
nw-syd-backup1@cse.unsw.edu.au
~backups
nw-syd-backup1:/export/nw-syd-backup1/1/backup/
~backups/users/$LASTTWO/$user[.n]/
This is the location of $user's rsnapshot archives, such that:
  • $LASTTWO is the last two characters of the user's CSE username,
  • $user is the name of the CSE user whose directory is stored in the archive.
  • If the user has been allocated more than one CSE directory, then [.n] is a unique numeric suffix assigned to all but one of the user's CSE directories.
The archive without the numeric suffix should always be the archive that stores the user's main home directory.
  • $user[.n] is actually a soft link to the actual storage location of the backup archive: backups/disks/$mountpoint/$LASTTWO/$user[.n]
~backups/users/$LASTTWO/$user[.n]/.bu_source
This file identifies the source of the actual user's directory that is stored in this rsnapshot archive. It will mpst likely be of the form: /import/bach/1/username
~backups/users/$LASTTWO/.deleted/$user[.n]/
If $user's source directory, as identified by the file: $user[.n]/.bu_source, no longer exists on the source filesystem, then the user's archive is moved into this .deleted/ directory.
~backups/disks/$mountpoint/$LASTTWO/$user[.n]/
This is the actual storage location of the backup archive for $user[.n].
There may be many different $mountpoint each of which represents a separate (possibly virtual) disk storage filesystem on which user's archives may be stored. This provides a way of managing disk space allocation.
The directory: ~backups/users/$LASTTWO/$user[.n]/ is actually a symlink to the actual directory in ~backups/disks/$mountpoint/$LASTTWO/$user[.n] that has been allocated to this user's backup archive.
~backups/var/
The location of various files containing (user/dir) data.
These files group (user/dir) data into different classifications which are used to determine which (user/dir)s are to be snapshotted, and how their archives are to be managed.
~backups/lib/
The location of most of the configuration files used by the NW backup system.
~backups/bin/
The location of most of the scripts used by the backup system, including the main backup script itself (described in the next section).

Scripts

runbackup.sh

This is the main script calling all the other scripts together that relate to backups.

Location
~backups/bin/runbackups.sh
Called by
/etc/cron.d/rsnapshot

backup

This is the main script that contains most of the bespoke commands used by the backup system.

Location
~backups/bin/backup
Called by
runbackups.sh

Many of the commands within backup can be run individually by passing them as arguments to backup, along with their desired options.

The backup -h option produces help documentation describing backup and its constituent commands. This documentation is also duplicated below, with additional details where this might be helpful.

Man Page

Usage
backup [-h] [-l log] [-n] [-q] command [commandargs]
Function
This script deals with CSE rsnapshot backups made for any CSE user who owns a home, or other directory, that is hosted on a CSE file server.
This script should usually be run on the CSE backup server: nw-syd-backup1.
Options
-h [ command | topic ]
Print help for the specific command or topic passed. If no command or topic is passed, print this general help and exit.
Topics and commands are summarised below.
-l logfile
Log messages to logfile (Default: ~backups/log/)
-n
Do not make any changes - just report what would be done.
-q
Do not reproduce messages on STDERR.
command [command_args]
Run the backup command with its optional command arguments.
Backup commands are summarised below.
Full descriptions of commands and their args may be produced by running:backups -h command and are also included in what follows, under their own subsections.
Topics
general
Usage: backups [-h] [-l log] [-n] [-q] command [commandargs]
locations
Locations used by the CSE Backups System
overview
Overview of CSE Backup System
Commands
diff
User related diff.
fixprimary
Fix primary home archive names if necessary.
mkspec
Prepare backup archive directories and specfiles for users.
movearch
Copy backup archives to another backup filesystem.
movesource
Change the recorded source directory for a user's backup archive.
purge
Shifts and/or deletes archives of expired users.
rename
Renames archive(s) of a CSE user whose username has changed.
report
Summarise logs to report on the last backups.
resurrect
Resurrect a user's inactive backup archive from deleted archives.
run
Create and/or update CSE New World user backup archives.

backup commands

run

This is the main backup command, which actually runs the rsnapshot backups for each CSE user.

Usage
backup [-genopts] run [-m MAXSPEC] [-N[123]] [-n] [-p MAXPROC] [-s SCRIPT] [-u user]
Function
Create and/or update CSE New World user backup archives.
  1. Create lists of all CSE (user/directories) in ~backups/var/.
    • Classify these directory sources
    • Fix primary homes if necessary (See: fixprimary command below)
  2. Create specfiles
    1. Ensure a backup archive exists for each (user/dir).
      (archives in $BU_USERS/)
    2. Create rsnapshot specfiles for each (user/dir).
      (specfiles in /var/tmp/backups/)
  3. Use xargs to run consecutive invokations of SCRIPT, passing a different specfile to each process each time.
    (See '-s SCRIPT' option below)
Options
-D
Run an xargs process for each physical disk storing archives.
This attempts to evenly distribute simultaneous disk activity across all physical disks.
Default: Only run one xargs process, and ignore details of physical storage disks.
-m MAXSPEC
Each xargs process will call SCRIPT no more than MAXSPEC times.
Default: Call SCRIPT until all specfiles are processed.
-N1
Do NOT (re)create user dir lists. (ie: Do not run Function step 1)
-N2
Do NOT (re)create rsnapshot dirs or specfiles. (ie: Do not run Function steps 2.1, 2.2)
-N3
Do NOT use xargs to call SCRIPT at all. (ie: Do not run Function step 3).
-n
Equivalent to -N3 or '-m 0'.
-p MAXPROC
Each xargs runs MAXPROC simultaneous process sequences of SCRIPT.
Default:
  • With '-D': 1;
  • Without '-D': 6
-s SCRIPT
xargs consecutively invokes this SCRIPT (in step 3),
passing SCRIPT another specfile each time.
Default SCRIPT - ~backups/bin/run_rsnapshot:
  1. Passes the specfile to rsnapshot with the required options.
  2. Removes the specfile after rsnapshot has run.
Note: SCRIPT may be any program. It need not run rsnapshot, nor must it do anything with the specfile passed.
-u user
Create the specfiles for just this user (unless '-N2'),
Run SCRIPT passing just user's specfiles (unless '-n')
(Default: Create and run specfiles of all CSE users).

mkspec

Usage
backup [-genopts] mkspec [-D] [-nfs] [file]
Function
Prepare backup archive directories and specfiles for users.
Read the input file of format: 'directory user host', and for each (user/dir) for which host is defined:
  1. Find or create an rsnapshot archive for (user/dir): ($BU_USERS/\$LASTTWO/\$user[.n]/)
  2. Create individualised rsnapshot specfiles (By default in $BU_TMP/)
If host is not defined, but an rsnapshot archive has been found for this (user/dir), then move the archive dir to $BU_USERS/\$LASTTWO/$DELETED_DIR/\$user[.n].
If host is not defined, but no rsnapshot archive is found, then silently ignore the (user/dir).
   This is the function called by the 'run' command to perform
   its steps (2a) and (2b). (See '$ME -h run')

Options:

   -D	 Create the specfiles in $BU_TMP/disk.N/ depending

on where the archive is stored.

   -nfs The rsnapshot specfile specified that the directory

should be accessed over NFS, rather than via an SSH connection to host (which is the default).

   file This is the file of format: 'directory user [host]'.

If host is present then the directory exists on host. (Default file: $BU_VAR/$D_SSH)

report

Usage:

   $ME [-genopts] report [-d date]

Function:

   Summarise logs to report on the last backups.

Options:

   -d date

Report on all backups run on this date. Date is specified as yyyy.mm.dd

   *NOT YET IMPLIMENTED*

diff

Usage:

   $ME [-genopts] diff user

Function:

   User related diff.
   *NOT YET IMPLIMENTED*

"

   [movearch]="Usage:
   $ME movearch (archname tofs | -f file)

Function:

   Copy backup archives to another backup filesystem.
   The backup archive archname is copied
   from: $BU_USERS/\$LASTTWO/\$archname

which is a soft link to its actual location at: \$FROMFS/\$LASTTWO/\$archname

   to:	\$tofs/\$LASTTWO/\$archname.
   If the copy is successfull, the soft link is adjusted to the new location.

Options:

   archname	The name of the specific archive to be moved
   		This is usually of the form: 'username[.[0-9]]'
   tofs	Full pathname of the destination filesystem

(eg: $BU_BACKUP/disks/3)

   -f file	Read file containing lines: 'archname tofs',

and copy and move each archive to tofs. Note: file of '-' will cause the script to read from STDIN.

   Default:	(no options or args) Read from STDIN. (ie: '-f -' )

Note:

   This command does not change the recorded source of the archive.
   If you have moved the source of the archive (ie: the location of the
   user's home dir) from one home directory server/fs to another,
   then use the movesource command.
   See: '$ME -h movesource'

movesource

Usage:

   $ME movesource user olddir newdir

Function:

   Change the recorded source directory for a user's backup archive.

Details:

   CSE Backup archives record their source directory in two places:
   (1) $BU_USERS/\$LASTTWO/\$username[.n]/$BU_SOURCEDIR

This record is maintained by '$ME'.

   (2) In each retention level archive pathname:
   	$BU_USERS/\$LASTTWO/\$username[.n]/daily.N/\$username/\$source_dir/
       This source_dir pathname is maintained by 'rsnapshot'.

Use:

   Run this command when a user's home (or other directory) has
   had its pathname changed in any way, either because:
   (a) Some directories in the pathname were renamed (but the contents were
       not physically relocated), or
   (b) The directory and its contents were relocated/copied from one disk
       server's file system to another, resulting in a pathname change.

Example:

   $ME movesource zain /import/kamen/1/zain /import/glass/A/zain

Note:

   This command only changes the archive source records in (1) and (2) above.
   (a) or (b) are expected to be done separately elsewhere.
   If \$username has also changed, and not just the source directory's
   pathname, then you should also use the 'rename' command.
   (See: '$ME -h rename')

rename

Usage:

   $ME rename oldusername newusername

Function:

   Renames archive(s) of a CSE user whose username has changed.

Details:

   In general, backup archives belonging to username have the following
   access pathname:

$BU_USERS/\$LASTTWO/\$username[.n]/daily.N/\$username/\$source_pathname/

   This command changes all occurences of \$username in such access pathnames
   from 'oldusername' to 'newusername', also changing \$LASTTWO where
   necessary.

Note:

   This command will not change any further occurences of \$username
   within \$source_pathname. If \$source_pathname has changed in any
   way, then you should also use the 'movesource' command.
   (See: '$ME -h movesource').

resurrect

Usage:

   $ME resurrect user [sourcedir]

Function:

   Resurrect a user's inactive backup archive from deleted archives.
   If sourcedir is passed, only resurrect the user's inactive backup archive
   coming from sourcedir, otherwise resurrect all inactive backup archives
   belonging to user.

Details:

   Active CSE users have backup archives kept in:
   (a)	$BU_USERS/\$LASTTWO/username[.n]/
   Expired CSE users have their backup archives made inactive and moved to:
   (b)	$BU_USERS/\$LASTTWO/$DELETED_DIR/username[.n]/
   If an expired CSE user has their account reactivated, then this command
   reactivates their backup archive by moving it from (b) to (a).

Note:

   This command only resurrects the backup archive (if it exists), so
   that the user's home directory(s) may be backed up once more.
   It does not restore the user's original home directory from the backup
   archive. Run the command 'restore' to do this.
   The source of the resurrected backup archives are assumed to stay unchanged.
   If the user's original directory(s) is restored into different
   file systems/sourcedirs, in addition to resurrecting the user's original
   archive, you may need to use the 'movesource' and/or 'rename' command.
   

See Also:

   '$ME -h purge'		- for what happens to deleted archives.
   '$ME -h movesource'	- for moving archive sources.
   '$ME -h rename'		- for changing user names.
   '$ME -h restore'	- for restoring user directory from archives.

purge

Usage:

   $ME [-genopts] purge [-u user] [-m MAX] [-n]

Function:

   Shifts and/or deletes archives of expired users.
   This command:
   a) Shifts all inactive/deleted archive directories stored in:

$BU_USERS/\$LASTTWO/$DELETED_DIR/

   b) Removes inactive/deleted archives with no retention directories left.

Options:

   -u user

Just shift and/or purge rsnapshot archives belonging to this user. (Default: All user archives in $DELETED_DIR/ directories)

   -m MAX

Only shift and/or purge MAX users.

   -n	Do not actually shift or remove any rsnapshot archives.

Just list deleted archives on STDOUT.

   -v	verbose listing - include disk usage

Background:

   When CSE users are expired, their backup archives are moved from

$BU_USERS/\$LASTTWO/user[.n]/

   to	$BU_USERS/\$LASTTWO/$DELETED_DIR/user[.n]/
   This command shifts the retention directories in these expired user's
   backup archives and removes them when they eventually become empty.

fixprimary

Usage:

   $ME	fixprimary

Function:

   Fix primary home archive names if necessary.
   For every CSE user with a CSE home and a CSE archive, ensure the
   user's primary home directory is stored in their primary user archive
   (ie: not stored in an archive with a '.N' suffix).
   This command uses renamearch to rename archives if necessary.