6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 1/12
UCSC-CSE-130 / project5-container Public template
README
Jump to bottom
Shun Kashiwa edited this page 2 days ago · 4 revisions
Project 5: Container
GOAL: This project will sum up all the knowledge you have learned about operating systems and
create a small container runtime that can run a containerized process on Linux.
Warning A student reported that overlayFS does not work on Windows. Please make a Piazza
post if you encounter a similar issue. If you have a virtual machine running Linux, try running
container there.
Introduction to Container
What is a Container?
Throughout the course, we have been using Docker and Dev Container to run our code. But what
exactly is a container?
A container is a lightweight and isolated execution environment that encapsulates an application
and its dependencies. It provides a consistent and reproducible environment across different
systems, allowing applications to run reliably regardless of the underlying infrastructure.
Container Image
Containers are created from container images, which are self-contained packages that include the
application code, runtime, system tools, libraries, and configuration files required for the application
to run.
Container Runtime
Code Issues Pull requests Actions Projects Wiki Security Insig
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 2/12
A container runtime is responsible for managing the lifecycle of containers. It provides an interface
between the container and the host operating system, orchestrating the necessary resources and
ensuring isolation and security within the container environment.
Providing Isolation
One of the most critical features of container runtimes is to provide isolation. That is, whatever
happens inside a container does not affect the host system or other containers.
Modern container runtimes provide isolation for many OS abstractions, such as CPU, memory,
network, etc. In this project, we focus on two essential abstractions: processes and files.
Isolating Processes: PID Namespace
Our container runtime should provide isolation for processes. For example, the processes running
on the host should not be visible to the processes inside the container.
Let's check how Docker isolates processes. We use the ps command to see the information about
running processes.
If you execute ps -A , it lists all processes that are running on the system.
This is the result of ps -A on my server. We can see many processes are running.
Let us create a new Docker container and execute the same command inside the container:
The first command creates a new Docker container using the alpine image. It opens a shell inside
the container, so we run ps -A . Even though many processes are running on the host system, they
are not visible inside the container.
PID TTY TIME CMD
1 ? 06:30:08 systemd
2 ? 00:00:44 kthreadd
3 ? 00:00:00 rcu_gp
4 ? 00:00:00 rcu_par_gp
6 ? 00:00:00 kworker/0:0H-kblockd
...
$ docker run --rm -it alpine
$ ps -A
PID USER TIME COMMAND
1 root 0:00 /bin/sh
7 root 0:00 ps -A
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 3/12
In Linux, the isolation of processes is achieved using the PID namespace. A process ID (PID) is a
unique number that identifies a running process. Every process belongs to a PID namespace and it
can only see processes in the same PID namespace.
Isolating Filesystem: Overlay Filesystem
Our container runtime also supports isolating the filesystem. Each container has its own filesystem
and cannot affect the host or other containers.
Let's check how Docker provides filesystem isolation. We use the same alpine image.
In one terminal, let us create a container and create a file foo in its root directory.
Open another terminal, create a new container, and run ls .
Even though both containers are created from the same image, foo is not visible in the new
container.
One way to achieve this is to use the Overlay Filesystem. The Overlay filesystem is a type of union
filesystem that allows multiple directories to be mounted together, presenting a single unified view.
It provides a way to overlay a read-write filesystem on top of a read-only filesystem, creating a
combined view that appears as a single coherent filesystem.
When a file or directory is accessed, the Overlay filesystem looks for it in the topmost layer first. If
the file is found, it is returned. If not, the filesystem searches the lower layers in a specific order until
it locates the file. This allows modifications to be made to the topmost layer, while the lower layers
remain unchanged. Changes made to the topmost layer are stored separately, without modifying
the underlying read-only layers.
Let's see how the overlay filesystem works with a simple example. (Note that this example might not
work inside DevContainer because Docker uses overlayFS and it doesn't allow nesting. container.c
gets around this by creating a tmpfs, as described here).
$ docker run --rm -it alpine
$ echo foo > foo
$ ls /
bin etc home media opt root sbin sys usr
dev foo lib mnt proc run srv tmp var
$ docker run --rm -it alpine
$ ls /
bin etc lib mnt proc run srv tmp var
dev home media opt root sbin sys usr
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 4/12
To use the overlay filesystem, we need three directories: lower , upper , and work . lower will be a
read-only directory that provides an "image", upper will store all changes on top of lower , and
work is used by the overlay filesystem as a workspace.
First, we use the following commands to create the directories. merged is a directory where we will
mount the overlay filesystem at.
Then, we create a read-only file in the lower directory.
Now, we are ready to create a new overlay filesystem. Use the following command to create an
overlay filesystem and mount it as merged .
Now, we mounted the overlay filesystem to merged . It provides a unified view of lower and upper .
Let's make some changes to the file in the overlay filesystem.
Because lower/foo is read-only, it cannot be modified. Instead, the overlay filesystem creates
upper/foo with the updated content. Because upper/foo now exists, merged/foo refers to the
updated content at upper/foo .
mkdir lower upper work merged
echo "this is in lower" > lower/foo
mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merged
$ ls merged
foo
$ cat merged/foo
this is in lower
echo "new foo" > merged/foo
$ echo merged/foo
new foo
$ echo upper/foo
new foo
$ echo lower/foo
foo
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 5/12
To provide an isolated filesystem for each container, we make an overlay filesystem. The lower
directory is the container image that stores all files and directories needed for that container. Each
container gets a unique upper directory, so changes by the container do not affect other
containers.
Implementing container.c
The goal of this project is to complete container.c . It is capable of creating a container from an
image and executing a command inside the container.
Creating an image directory
In container.c , an image is a directory under ./images that stores all the files and directories
required for the system. You can think of an image directory as a snapshot of the system root
directory.
The easiest way to create an image is to use the docker export command. Let us create an image
directory from the alpine docker image.
First, we create a Docker container using docker run --rm -it alpine sh . This opens a shell inside
the newly created container.
Second, we need to get the ID of the container. Open a new terminal and run docker ps to see the
list of running Docker containers.
Find the container and copy the container ID ( f1cf18783484 in this case).
Then, we run docker export {container ID} > alpine.tar to create a tarball of the image.
Finally, we extract files in the tarball to ./images/{image name} using the following commands.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f1cf18783484 alpine "sh" 2 minutes ago Up 2 minutes
boring_sanderson
docker export f1cf18783484 > alpine.tar
$ mkdir images/alpine
$ tar -xf alpine.tar -C images/alpine
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 6/12
Now, we have the image directory called alpine . We use this identifier to specify the image.
Command-line Interface
container takes three or more arguments.
The first argument (ID) specifies the unique ID of the container.
Docker assigns this randomly, but we require the user to provide one.
The ID can be at most 16 characters ( CONTAINER_ID_MAX ).
The second argument (IMAGE) specifies the image to create a container.
./images/{IMAGE} must exist and store all files required for this container
The rest of the arguments specify the commands to run inside the container.
It can be more than one, as the user might provide options.
For example, ./container my-container alpine echo "hello world" will
1. Create a container with ID my-container
2. Use the image located at ./images/alpine
3. Execute echo "hello world" inside the container
main
You will need to complete two functions in container.c : main and container_exec .
main is the entry point of the command-line interface. It needs to parse the command-line
arguments ( argv ) and create a child process by calling clone with appropriate parameters.
$ ls images/alpine
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
$ ./container
Usage: ./container [ID] [IMAGE] [CMD]...
sudo ./container my-container alpine echo "hello world"
hello world
int clone_flags = SIGCHLD | CLONE_NEWNS | CLONE_NEWPID;
int pid = clone(container_exec, &child_stack, clone_flags, &container);
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 7/12
clone works similarly to the fork system call and creates a child process. The child process
executes the container_exec function and takes container as an argument, just like how we
passed arguments to a new thread in pthread_create . By passing three flags, the child process will
have separate PID and mount namespaces that provide isolation.
Add fields to the container struct and fill values in main so container_exec will have enough
information to create a container from an image and execute the command.
container_exec
main executes container_exec in a child process with separate PID and mount namespaces.
container_exec needs to
1. create and mount an overlay filesystem
2. call change_root
3. use execvp to run the command
Creating an overlay filesystem
container_exec needs to create an overlay filesystem. The merged directory will have everything
inside the image directory plus the changes made inside the container and will be used as a root of
the filesystem inside the container.
To create an overlay filesystem, use the mount function.
source is often a path referring to a device. Because we are not mounting a device, use the
dummy string "overlay"
target specifies the directory at which to create the mount point. Use the merged directory
path: /tmp/container/{id}/merged .
filesystemtype specifies the type of the filesystem. Use "overlay" .
mountflags provides options. Use MS_RELATIME .
data provides options specific to the filesystem. The overlay filesystem takes the three
arguments (lowerdir, upperdir, workdir) in the format: lowerdir={lowerdir},upperdir=
{upperdir},workdir={workdir} . Construct a string of this format and pass the pointer.
int mount(const char *source, const char *target,
const char *filesystemtype, unsigned long mountflags,
const void *data);
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 8/12
lowerdir should be the image directory. In principle, upperdir and workdir can be any directory,
but in order for the overlay filesystem to work inside the Dev Container, those directories must be
inside /tmp/container . main creates this directory.
Use /tmp/container/{id}/lower , /tmp/container/{id}/work for lowerdir and workdir ,
respectively. In order for mount to work, those directories must exist. Use mkdir to create a
directory if it does not exist.
For example, if the current directory is /workspaces/project5-container , the container ID is my-
container , and image name is alpine , you need to call
Change the root mount
Now, the overlay filesystem is mounted at /tmp/container/{id}/merged . We want the child process
to treat this as the root directory.
pivot_root is the system call to achieve this. Because calling pivot_root is complex and tedious,
we provided a helper function to do so.
Provide the path to the "merged" directory to the change_root function. It will call pivot_root to
change the root directory to the "merged" directory and ensure it cannot access outside directories.
change_root also does a couple of more things to ensure the container works properly, such as
setting the PATH environment variable.
Execute the command
At this point, the child process has its own PID namespace and the overlay filesystem as its root
directory. The last step is to execute the specified command.
Use execvp(3) so it can execute commands without specifying the full path to the executable.
mount(
"overlay",
"/tmp/container/my-container/merged",
"overlay",
MS_RELATIVE,
"lowerdir=/workspaces/project5-container/images/alpine,upperdir=/tmp/container/my-contain
);
void change_root(const char* path)
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 9/12
file specifies the name of the command, and argv specifies the entire arguments. argv needs
to be null-terminated.
For example, if the command is echo "hello world" , you should call
Testing
Testing with the alpine image
You can test your container runtime using the alpine image described earlier. In particular, we
want to make sure processes and the filesystem are isolated.
To check the process isolation, we can use the ps command.
ps -A must not print the processes running on the host. The command used to create the
container ( sh in the above example) should have PID 1.
To check the file system isolation, you can use cd inside the container to try to get out of the file
system. If change_root is called properly, you should not be able to get out of the overlay
filesystem.
int execvp(const char *file, char *const argv[]);
char *argument_list[] = {"echo", "hello world", NULL};
execvp(argument_list[0], argument_list);
$ sudo ./container my-container alpine sh
--- inside container ---
$ ps -A
PID USER TIME COMMAND
1 root 0:00 sh
2 root 0:00 ps -A
$ sudo ./container my-container alpine sh
# inside container
$ cd /../../
$ ls
bin dev etc home lib media mnt opt proc root run sbin srv
sys tmp usr var
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 10/12
Any changes made inside the container should be visible in the upper directory.
Testing with other images
While our container runtime is minimal, it is capable of running a variety of images. Once you are
done testing with the alpine image, try using your container runtime to execute your favorite image.
For example, here is how to execute JavaScript (Node.js) using the node:18-alpine image.
Try running your favorite programming language with your container runtime!
Notes
make must create the container executable.
All source files must be formatted using clang-format. Run make format to format .c and .h files.
$ sudo ./container my-container alpine sh
--- inside container ---
$ echo hello from container > hello.txt
$ exit
--- returned to host ---
$ sudo cat /tmp/container/my-container/upper/hello.txt
hello from container
# follow similar steps to create an image directory
$ docker pull node:18-alpine
$ docker run --rm -it node:18-alpine sh
# in a different terminal
$ docker ps # copy the container ID
$ docker export {container-id} > node.tar
$ mkdir images/node
$ tar -xf node.tar -C images/node
$ sudo ./container node-container node node
Welcome to Node.js v18.16.0.
Type ".help" for more information.
>
Error: Could not open history file.
REPL session history will not be persisted.
> console.log("hello, world!")
hello, world!
undefined
>
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 11/12
Pages 2
Find a page…
Home
README
Project 5: Container
Introduction to Container
What is a Container?
Container Image
Container Runtime
Providing Isolation
Isolating Processes: PID Namespace
Isolating Filesystem: Overlay Filesystem
Implementing container.c
Creating an image directory
Command-line Interface
main
container_exec
Creating an overlay filesystem
Change the root mount
Execute the command
Testing
Testing with the alpine image
Testing with other images
Notes
The filesystems can often enter wrong states. If filesystems behave weirdly, try running the "Dev
Containers: Rebuild Container" command in VS Code. This will recreate the Dev Container and is
likely to resolve the issue. You can also try restarting Docker Desktop.
Acknowledgements
Kernel OverlayFS document
ianlewis/execc: A simple container runtime in bash
Linux containers from scratch - diyC
Container Creation Using Namespaces and Bash | Nicolas Mesa
6/5/23, 8:07 PM README · UCSC-CSE-130/project5-container Wiki
https://github.com/UCSC-CSE-130/project5-container/wiki/README 12/12
Acknowledgements
Clone this wiki locally
https://github.com/UCSC-CSE-130/project5-container.wiki.git