HPC2N
Documentation Viewer

Volume level AFS backups at HPC2N

1 Software used

The backup solution is based on the following software:

  • The OpenAFS utility 'vos dump' to produce the AFS volume dumps.
  • Tivoli Storage Manager (TSM) to handle the actual storage of backups. The TSM client needs to be installed and configured on your AFS servers.
  • The locally developed program tsmpipe (available at /afs/hpc2n.umu.se/lap/tsmpipe/) to be able to pipe output from 'vos dump' directly to TSM in order to not depend on local temporary storage.
  • The locally developed script afsbackup.pl (available at /afs/hpc2n.umu.se/lap/scripts/x.x/arch_other/sbin/afsbackup.pl) to glue everything together.

2 Features of this solution

The purpose of this backup solution was to provide a way out from using the now unsupported dsmcafs client for TSM. In order to avoid having to change the way we do backups in the near future we wanted to base this on existing software, and even though file based backups are handy when restoring a single file doing volume level backups is better when looking at backup and disaster recovery duration.

It should be noted that to restore a volume you only need the standard OpenAFS tools and a TSM client installed.

The design choices for this backup solution are heavily inspired by the solution used by PDC ( http://www.pdc.kth.se/ ), with some improvements possible due to us doing a rewrite from scratch of the scripts used.

afsbackup.pl provides the following basic functionality

  • Each AFS file server backs up its own filesets and stores these into a common node at the TSM server using the ProxyNode functionality. This eliminates the single-backup-machine bottleneck and allows us to cater for filesets moved between fileservers without special handling.
  • Volume selection is done based upon which volumes has a backup volume. Backup volumes are refreshed as needed and there should preferably not be other scripts doing this.
  • Each volume has a maximum of two active dumps at any time, a full dump and an incremental dump. Basic logic is present to force a full dump when the incremental dump grows big and so on. Dumps are stored as backup files named /afsbackups/cellname/volumename.dependentdumptime.dumptime into TSM in order to be able to let the TSM server handle expiration of old backups using Copy Groups.
  • At the time of this writing, tsmpipe is used to store volume dumps into TSM and to delete volume dumps from TSM. Because other functionality isn't yet implemented in tsmpipe, dsmc (the TSM client) is used to list files stored in TSM and to perform restores.

3 Setup

Using this backup solution is relatively simple, the most complex task is the initial setup as usual with TSM backups.

3.1 Initial setup - TSM Server stuff

Remember that the dumps are stored as backup files. You need to fix the following on your TSM server:

  • Decide on a storage hierarcy (target storage pools and so on).
  • Create a dedicated management class and copygroup, named AFSDUMP.
  • Create a target node for the backups.
  • Edit afsbackup.pl to include that target node.

3.2 Initial setup - AFS stuff

The script currently backs up every volume which has a backup volume. So, make sure to do vos backup of all volumes that should be backed up, and that your maintenance scripts does vos backup at volume creation.

3.3 Backup client setup

Add each AFS server as a backup node in TSM, just as you would do if you were to backup the entire server.

Grant proxynode access for this machine to the AFS backup target node on your TSM server.

Edit your include-exclude file to bind the /afsbackups/-tree to the AFSDUMP management class. Also you might want to exclude your /vicep* partitions (unless you use the name based file server and want an easy disaster recovery). Something like the following should suffice:

exclude.fs /vicep*
include /afsbackups/.../* afsdump

Verify your include-exclude by running:

dsmc q inclexcl

Verify your proxynode config by doing a query, it should not give errors like 'Access Denied' or similar:

dsmc q back -asnode=targetnode -subdir=yes '/afsbackups/*'

Verify that your vos utility is able to do localauth:

vos exa root.cell -localauth

Verify that you have tsmpipe installed:

tsmpipe -h

If this is the first installation you need to create the /afsbackups filespace on the TSM server. We do this by calling tsmpipe manually to store a file and removing it afterwards:

echo hej | env DSM_DIR=/etc/tsm DSM_CONFIG=/etc/tsm/dsm.opt tsmpipe -O '-asnodename=targetnode' -B -c -s '/afsbackups' -f 'testfil' -l 4 -v
env DSM_DIR=/etc/tsm DSM_CONFIG=/etc/tsm/dsm.opt tsmpipe -O '-asnodename=targetnode' -B -d -s '/afsbackups' -f 'testfil' -v

Run afsbackup.pl manually as root to verify that you have all required perl modules installed. Something like:

afsbackup.pl 2>&1 | tee -a /var/log/afsbackup.log | egrep -i 'ERROR|WARNING'

Add a cron entry and view the log later. We use the following as '/etc/cron.d/run-afsbackup':

# Run AFS backup
0 5 * * * root /usr/local/sbin/afsbackup.pl 2>&1 | tee -a /var/log/afsbackup.log | egrep -i 'ERROR|WARNING'

4 Restoring stuff

Restores can be performed from any machine with an AFS and TSM client installed which is also registered as proxy node. This means that you can perform restores on a machine of your choice as long as the prerequirements are fulfilled.

Until restore has been implemented in tsmpipe restoring is done in two steps: restoring the dumps from TSM to a local filesystem followed by restoring the dumps onto AFS.

It's important to note that if real disaster has struck, we will be able to restore stuff by only installing an AFS and TSM client. We're not depending on special scripts/software that we store in AFS.

The following example will restore the most recent backup of the fileset L.ase. To get older (ie. inactive) backups, add -inactive to the parameter list of dsmc. -pick might also come in handy.

4.1 Restoring the dump(s) from TSM to local files

First, do a query to investigate how much local diskspace I need:

dsmc q back -asnode=afsbackup_node.hpc2n.umu.se '/afsbackups/hpc2n.umu.se/L.ase.*'

This yields the following file list:

             Size      Backup Date        Mgmt Class A/I File
             ----      -----------        ---------- --- ----
API 13,864,956  B  01/19/2007 05:18:30    AFSDUMP     A  /afsbackups/hpc2n.umu.se/L.ase.0.1169180280
API    399,326  B  01/20/2007 05:10:26    AFSDUMP     A  /afsbackups/hpc2n.umu.se/L.ase.1169180280.1169266200

The backup consists of a full dump of 13MB and an incremental dump of 399kB. It fits nicely into my scratch filesystem, so I restore it there:

dsmc restore -asnode=afsbackup_node.hpc2n.umu.se -verbose '/afsbackups/hpc2n.umu.se/L.ase.*' /scratch/

Because dsmc seems to make assumptions that file permissions are stored with the files it tends to set it to random values for files stored with tsmpipe (or other API tools). Set proper file permissions on restored files:

chmod 600 /scratch/hpc2n.umu.se/L.ase.*
chown root:root /scratch/hpc2n.umu.se/L.ase.*

4.2 Restoring volume dump files to AFS

Our target server is mamba and the target partition is b. We restore the volume to the new name L.ase.restore .

First, we restore the full dump:

vos restore mamba b L.ase.restore -file L.ase.0.1169180280 -localauth

Then we add the incremental dump:

vos restore mamba b L.ase.restore -file L.ase.1169180280.1169266200 -overwrite incremental -localauth

And if there was no unexpected errors, the volume is now restored. Proceed with creating mountpoint(s), backup volumes and readonly clones as appropriate.

$Id: backup.txt,v 1.8 2008/02/12 15:20:04 nikke Exp $

This information is also available as plain text.
More documentation can be found using the documentation browser.

HPC2N Resources Accounts Support Sysinfo Search News

2009-06-12 - wmaster