Proposed home directory architecture, configuration and management - project notes

From techdocs
Jump to navigation Jump to search

Preliminary

  • Don't over-engineer
  • Don't over-provision
  • Don't require live VM migration capability (i.e. migration while the VM is running)
  • Build in to the design the ability to expand performance and capacity as required
  • Expansion to be based on historical, monitored performance and on available support capabilities
  • No disk quotas initially
  • Plan for an initial 100 users (to be selected by Andrew). 50 on Ceph storage. 50 on AWS Storage Gateway storage

Design

  • Two 1TB volumes for home directories - one in Ceph, and one in AWS (Storage Gateway). Assign each one its own IP address in hostlist.csv
  • Two virtual AWS Storage Gateways. The local, physical SSD cache storage attached to the AWS Storage Gateways must exceed the size of the volumes being made available by them via iSCSI, otherwise the Gateways will be continually fetching data from AWS' S3 and there's a dollar cost to doing this
  • Consider (only) how to implement a Linux VM in the storage/compute cluster which makes Ceph storage available as iSCSI devices to other hosts as: a) an alternative to AWS Storage Gateway, b) might have reduced latency or faster speed, and c) obviates the need for storage clients to have the Ceph libraries installed. Would need future implementation and testing
  • Two virtual NFS servers (without local home directory storage)
  • All VM's in storage/compute cluster
  • If learning libvirt is an obstacle, don't use it initially and leave it as "an exercise for later"
  • Design and implement useful metrics. E.g., not just latency, but ways to measure speed, responsiveness and performance from a user's experience perspective
  • Consider if there might be obstacles to getting these home directories into the New World backup system

Monitoring and alarms

  • Implement meaningful alarms, warnings and alerts - Nagios - nw-syd-monitor1?
  • Implement historical volume activity[1], network interface traffic, CPU usage, memory usage and load-average records and graphs - nw-syd-monitor2
  • Extend SNMP daemon rather than use SSH to collect data

To learn AND DOCUMENT

  • iSCSI, iscsid (the daemon, managed by systemd), iscsiadm (the administration/management tool)
  • Building, attaching and detaching volumes of both types to an NFS server (ie., Linux host) without rebooting
  • Ceph (general configuration and operation)
  • AWS Storage Gateway (general configuration and operation)
  • Moving volumes (both types) between NFS servers, and between AWS Storage Gateways (as appropriate)
  • Using ip addr … to add/remove home directory volume IP addresses from NFS servers

Future considerations

Footnotes

  1. See rbd -p rbd perf image iostat and rbd -p rbd perf image iotop. Don't know about AWS Storage Gateway. Maybe SNMP?