Controlling Resources & Privileges with Linux namespaces, cgroups & limits

From techdocs
Jump to navigation Jump to search

resource limits

The Linux setrlimit system call is useful when running student code. The bash builtin ulimit provides convenient access. For example, some useful ulimit options for running student code with a typical value:

-c 0      # the maximum size in KB of core files created 
-d 100000 # the maximum size in KB of a process's data segment
-f 8192   # the maximum size in KB of files written
-n 256    # the maximum number of open file descriptors
-s 32000  # the maximum stack size in KB
-t 60     # the maximum amount of cpu time in seconds

Limits above For example, the dd command below writes as many zeros as possible to the file /tmp/zeros.

The ulimit command stops it at 1 MB (1000 KB).

$ bash -c 'ulimit -c 0 -f 1000; dd if=/dev/zero of=/tmp/zeros'
File size limit exceeded
$ ls -l /tmp/zeros
-rw-r--r-- 1 andrewt andrewt 1024000 Jun 21 20:33 /tmp/zeros
$ 

A resource limit on virtual memory should be avoided because some programs use an extended address space.

For example, the address sanitizer available in gcc & clang breaks with a 1GB limit on virtual memory.

$ echo 'int main(void){}'|clang -fsanitize=address -x c -
$ ./a.out
$ bash -c 'ulimit -v 1000000; ./a.out'
==1690658==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
$ 

The resource limit on the number of processes has limited utility because it is a limit on the total number of processes a user can run (processes running as subordinate uids are included in the total).

$ bash -c 'ulimit -u 2; date; date'
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
....

This means a resource limit can't be reliably used, for example, to stop student code with a fork-bomb interfering with marking software. See the example below, for how do this isntead with a cgroup.


unshare

The Linux unshare system call also available on the command-line as unshare allows via namespaces to be executed with reduced privileges, reduced access to resources or a changed view of the system.


user namespaces

Code can be executed with another uid inside a {{man|user_namespace|7}.

$ id -u
517
$ unshare --map-user=42 id -u
42
$ unshare --map-user=42 cat /proc/self/uid_map
        42        517          1
$ unshare --map-root-user id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)
$ unshare --map-root-user cat /proc/self/uid_map
         0        517          1

user name spaces allow superuser-like privileges within the namespaces for example chroot

$ lsb_release -c
Codename:   bullseye
$ unshare --map-current-user --root /web/teachadmin/filesystems/bookworm lsb_release -c
Codename:   bookworm

but privileges outside are unchanged:

$ unshare --map-root-user ls /root
ls: cannot open directory '/root': Permission denied

mount namespaces

mount namespaces allow code to be executed with a different file system view using bind mounts.

For example, we can mount /tmp over /home and then execute code without it being able to access home directories:

$ unshare --map-root-user --mount sh -c '
mount --rbind /tmp /home
unshare --map-user=andrewt --map-group=andrewt ls /home/andrewt
'
ls: cannot access '/home/andrewt': No such file or directory

network namespaces

network namespaces allow a process to be execute with a different view of the netork. For example:

$ ping -c 1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.019 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.019/0.019/0.019/0.000 ms
$ unshare --net --map-current-user ping -c 1 127.0.0.1
ping: connect: Network is unreachable
$ curl -sI https://www.unsw.edu.au|sed 1q
HTTP/2 200 
$ unshare --net --map-current-user curl -I https://www.unsw.edu.au
curl: (6) Could not resolve host: www.unsw.edu.au

cgroups

cgroups can limit the number of processes created by command and its children, allowing robust handling of execution code containing a fork-bomb.

For example via systemd-run :

$ systemd-run --quiet --user --scope -p TasksMax=5 sh -c '
    for i in $(seq 1 10)
    do
        sleep 10 &
        echo $i;
    done
    '
1
2
3
4
/bin/sh: 0: Cannot fork

See systemd.resource-control for other resources that controlled by cgroups.

For example controlling the memory use of set of processes:

$ printf '
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
    calloc(1, 1e8);
    sleep(5);
}'|
clang -x c - -o calloc_100mb_and_sleep
$ systemd-run --quiet --user --scope -p MemoryMax=200M -p MemorySwapMax=200M sh -c '
    for i in $(seq 1 6)
    do
        ( ./calloc_100mb_and_sleep; echo process $i exit status was $?) &
    done
    wait
    '
Killed
process 3 exit status was 137
Killed
process 2 exit status was 137
process 4 exit status was 0
process 1 exit status was 0
process 5 exit status was 0
process 6 exit status was 0
$

suboordinate uids and gids

/etc/subuid and /etc/subgid allow an unprivileged user to map their uid to other uids in a child namespace.

buildah unshare creates a user namespace mapping the user's uid/gid to root and all other uids 0..65535 to subordinate uids/gids.

$ buildah unshare cat /proc/self/uid_map
         0        517          1
         1   29328512      65536

setpriv can be used to run code as a subordinate uid/gid:

$ temp_dir=$(mktemp -d)
$ cd "$temp_dir"
$ chmod 777 .
$ buildah unshare sh -c '
    for i in $(seq 1 9)
    do
        setpriv --ruid $i --rgid $i --clear-groups sh -c "touch file$i"
    done
    ls -l file*  # ls inside the name space
'
-rw-r--r-- 1 daemon daemon 0 Jun 22 15:59 file1
-rw-r--r-- 1 bin    bin    0 Jun 22 15:59 file2
-rw-r--r-- 1 sys    sys    0 Jun 22 15:59 file3
-rw-r--r-- 1 sync   adm    0 Jun 22 15:59 file4
-rw-r--r-- 1 games  tty    0 Jun 22 15:59 file5
-rw-r--r-- 1 man    disk   0 Jun 22 15:59 file6
-rw-r--r-- 1 lp     lp     0 Jun 22 15:59 file7
-rw-r--r-- 1 mail   mail   0 Jun 22 15:59 file8
-rw-r--r-- 1 news   news   0 Jun 22 15:59 file9
$ ls -l file*  # ls outside the name space
-rw-r--r-- 1 29328512 29328512 0 Jun 22 15:59 file1
-rw-r--r-- 1 29328513 29328513 0 Jun 22 15:59 file2
-rw-r--r-- 1 29328514 29328514 0 Jun 22 15:59 file3
-rw-r--r-- 1 29328515 29328515 0 Jun 22 15:59 file4
-rw-r--r-- 1 29328516 29328516 0 Jun 22 15:59 file5
-rw-r--r-- 1 29328517 29328517 0 Jun 22 15:59 file6
-rw-r--r-- 1 29328518 29328518 0 Jun 22 15:59 file7
-rw-r--r-- 1 29328519 29328519 0 Jun 22 15:59 file8
-rw-r--r-- 1 29328520 29328520 0 Jun 22 15:59 file9
$ cd .. 
$ rm -rf "$temp_dir"
$

Code can be prevented, for example from accessing files readable/writeble by a user primary uid, by running it as a subordinate uid.

This example shows code being run as subordinate uid and gid unable to read the user's home directory (which is not public or group read)

$ ls -ld /home/andrewt
drwx--x--x 304 andrewt andrewt 28672 Jun 23 09:01 /home/andrewt
$ buildah unshare setpriv --ruid nobody --rgid nobody --clear-groups sh -c "id; ls ~"
uid=65534(nobody) gid=60001(nobody) groups=60001(nobody)
ls: cannot open directory '/home/andrewt': Permission denied