Storage and compute cluster: Difference between revisions

From techdocs
Jump to navigation Jump to search
Line 86: Line 86:
# <code> systemctl start ceph-mgr@&lt;host&gt;</code>
# <code> systemctl start ceph-mgr@&lt;host&gt;</code>
# <code> systemctl enable ceph-mgr@&lt;host&gt;</code>
# <code> systemctl enable ceph-mgr@&lt;host&gt;</code>
Some run time configuration related to running mixed-version Ceph environments (which we hopefully don't do):
# <code>ceph config set mon mon_warn_on_insecure_global_id_reclaim true</code>
# <code>ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed true</code>
# <code>ceph config set mon auth_allow_insecure_global_id_reclaim false</code>
# <code>ceph mon enable-msgr2</code>
Check Ceph cluster-of-one-node status:
# <code>ceph status</code>

Revision as of 11:38, 24 October 2022

The storage and compute cluster is intended as a replacement for CSE's VMWare and SAN infrastructure. Its primary functions are:

  1. A resilient, redundant storage cluster consisting of multiple cheap[-ish] rack-mounted storage nodes running Linux and Ceph using multiple local SSD drives, and
  2. Multiple compute nodes running Linux and QEMU/KVM acting as both secondary storage nodes and virtual machine hosts. These compute nodes are similar in CPU count and RAM as CSE's discrete login/VLAB servers.

Important implementation considerations

  1. Data in the storage cluster is replicated in real-time across multiple hosts so that the failure of one or more nodes will not cause loss of data AND when any such failure occurs the cluster software (Ceph) will automatically rebuild instances of lost replicas on remaining nodes.
  2. Similarly, compute nodes are all similarly configured in terms of networking, CPU count and RAM and should any compute node fail, the virtual machines running on it can be migrated or restarted on another compute node without loss of functionality.
  3. While initially co-located in ther same data centre (K17), the intention is that storage and compute nodes can be distributed across multiple data centres (especially having two located at the UNSW Kensington campus) so a data centre failure, rather than just the failure of a single host, does not preclude restoring full operation of all hosted services.

Concept

Broad concept of the storage and compute cluster

The diagram at the right shows the basic concept of the cluster.

  1. The primary storage and compute nodes have 10Gb network interfaces used to access and maintain the data store.
  2. There are two network switches providing redundancy for the storage node network traffic and ensuring that at least one compute node will have access to the data store in case of switch failure.
  3. Management of the cluster happens via a separate subnetwork to the data store's own traffic network.

Additional and unlimited storage and compute nodes can be added.

Of course, if/when the cluster is decentralised, the networking will have to be revisited according to the networking available at the additional data centres.

What Ceph provides

Fundamentally, Ceph provides a redundant, distributed data store on top of physical servers running Linux. Ceph itself manages replication, patrol reads/scrubbing and maintaining redundancy ensuring data remains automatically and transparently available in the face of disk, host and/or site failures.

The data is presented for use primarily, but not exclusively, as network-accessible raw block devices (RADOS Block Devices or RBD's) for use as filesystem volumes or boot devices (the latter especially by virtual machines); and as mounted network file systems (CephFS) similar to NFS. In both cases the data backing the device or file system is replicated across the cluster.

Each storage and compute node typically runs one or more Ceph daemons while other servers accessing the data stored in the cluster will typically only have the Ceph libraries and utilities installed.

The Ceph daemons are:

  1. mon - monitor daemon - multiple instances across multiple nodes maintaining status and raw data replica location information. Ceph uses a quorum of mon's to avoid split brain issues and thus the number of mon's should be an odd number greater than or equal to three.
  2. mgr - manager daemon - one required to assist the mon's. Thus, it's useful to have two in case one dies.
  3. mds - a metadata server for CephFS (such as file permission bits, ownership, time stamps, etc.). The more the merrier and, obviously, two or more is a good thing.
  4. osd - the object store daemon. One per physical disk so there can be more than one on a physical host. Handles maintaining, replicating, validating and serving the data on the disk. OSD's talk to themselves a lot.

What QEMU/KVM provides

  1. Machine virtualisation
  2. Limited virtual networking which supplemented by the local host's own networking/bridging/firewalling

Installing and configuring the physical servers to run Linux and Ceph

Servers are Dell.

  1. On the RAID controller:
    • Configure two disks for the boot device as RAID1 (mirrored)
    • Configure the remaining data store disks as RAID0 with one single component disk each
  2. Use eth0 to do a network install of Debian Bullseye selecting:
    • Static network addressing
    • SSH Server only
    • Everything installed in the one partition
    • Configure timezone, keyboard, etc.
  3. Reboot into installed OS
  4. Fix/enable root login
  5. Changee sshd_config to include
    • UsePAM no
    • PasswordAuthentication no
  6. Install packages:
    • apt-get install wget gnupg man-db lsof strace tcpdump iptables-persistent rsync software-properties-common
  7. Ensure unattended upgrades are installed and enabled (dpkg-reconfigure unattended-upgrades)

Install Ceph:

Refer to the manual installation procedures at Installation (Manual).

  1. wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
  2. apt-add-repository 'deb https://download.ceph.com/debian-quincy/ bullseye main'
  3. <apt-get install ceph

When first creating a Ceph cluster you need to the following ONCE. Once the cluster is running no further bootstrapping is required.

  1. Perform Monitor Bootstrapping (Monitor bootstrapping)
  2. You may need to run chown -R ceph:ceph /var/lib/ceph/mon/ceph-<host> to get the mon started the first time.
  3. systemctl enable ceph-mon@<host>

Set up a mgr daemon:

  1. mkdir /var/lib/ceph/mgr/ceph-<host>
  2. ceph auth get-or-create mgr.<host> mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-<host>/keyring
  3. chown -R ceph:ceph /var/lib/ceph/mgr/ceph-<host>
  4. systemctl start ceph-mgr@<host>
  5. systemctl enable ceph-mgr@<host>

Some run time configuration related to running mixed-version Ceph environments (which we hopefully don't do):

  1. ceph config set mon mon_warn_on_insecure_global_id_reclaim true
  2. ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed true
  3. ceph config set mon auth_allow_insecure_global_id_reclaim false
  4. ceph mon enable-msgr2

Check Ceph cluster-of-one-node status:

  1. ceph status