About#
In this article we explore concepts related to container technology. We start from the basic by understanding what are containers and virtual machines. Then we explore the open container initiative (image-spec and runtime-spec) and later explore Docker and Podman specific concepts.
Introduction#
What are Virtual Machines?#
- Applications are generally deployed in a virtual machine.
- A Virtual Machine (VM) is a compute resource that uses software instead of a physical computer to run programs and deploy apps.
- Each virtual machine runs its own operating system and functions separately from the other VMs, even when they are all running on the same host.
What are Containers?#
Containers are a technology that allows applications to be packaged and isolated with their entire runtime environment.
Why use Containers?#
- The computational overhead spent virtualizing hardware for a guest OS to use is substantial.
- They make it easier to maintain consistent behavior and functionality while moving the contained application between environments.
- Containers share the machine’s OS system kernel and therefore do not require an OS per application, driving higher server efficiencies.
What is the Open Container Initiative (OCI)?#
The Open Container Initiative (OCI) is a lightweight, open governance structure (project) for the express purpose of creating open industry standards around container formats and runtimes.
The OCI currently contains three specifications:
- The Runtime Specification (runtime-spec).
- The Image Specification (image-spec).
- The Distribution Specification (distribution-spec).
Runtime Specification#
The Open Container Initiative Runtime Specification aims to specify the configuration, execution environment, and lifecycle of a container. The Runtime Specification outlines how to run a “filesystem bundle” that is unpacked on disk.
A container’s configuration is specified in the config.json for the supported platforms and details the fields that enable the creation of a container. The execution environment is specified to ensure that applications running inside a container have a consistent environment between runtimes, along with common actions defined for the container’s lifecycle.
Application bundle builders can create a bundle directory that includes all the files required to launch an application as a container. The bundle contains an OCI configuration file (config.json) where the builder can specify host-independent details such as which executable to launch (process object defined in the config.json file) and host-specific settings such as mount locations, hook paths, Linux namespaces and cgroups.
What is a file system bundle?#
A set of files organized in a certain way and containing all the necessary data and metadata for any compliant runtime to perform all standard operations against it.
A container is encoding as a filesystem bundle on disk. The definition of a bundle is concerned only with how a container and its configuration data are stored on a local filesystem so that they can be consumed by a compliant runtime.
A Standard Container bundle contains all the information needed to load and run a container. This includes the following artifacts:
config.jsoncontaining all configuration data. (File is mandatory)- The container’s root filesystem, referred to by
root.pathin theconfig.jsonfile. (Optional but mandatory in Windows)
Scope of a Container#
The entity using a runtime to create a container MUST be able to use the operations defined in this specification against that same container. Whether other entities using the same, or other, instance of the runtime can see that container is out of scope of this specification.
State of a Container#
The state of a container includes the following properties:
- ociVersion
- id
- status (Additional values MAY be defined by the runtime, however, they MUST be used to represent new runtime states not defined below.)
- creating: The container is being created.
- created: The runtime has finished the create operation, and the container process has neither exited nor executed the user-specified program.
- running: The container process has executed the user-specified program but has not exited
- stopped: The container process has exited
- pid
- bundle
- annotations
//Example of state
{
"ociVersion": "0.2.0",
"id": "oci-container1",
"status": "running",
"pid": 4422,
"bundle": "/containers/redis",
"annotations": {
"myKey": "myValue"
}
}
Runtime Lifecycle#
The lifecycle describes the timeline of events that happen from when a container is created to when it ceases to exist.
- OCI
createoperation command is invoked. - The container’s runtime environment MUST be created according to the configuration in
config.json. While the resources requested in theconfig.jsonMUST be created, the user-specified program MUST NOT be run at this time. Any updates toconfig.jsonafter this step MUST NOT affect the container. prestart hookcreateRuntime hookcreateContainer hook- Runtime’s
startcommand is invoked with the unique identifier of the container. startContainer hook- The runtime MUST run the user-specified program, as specified by
process. (processobject is defined in theconfig.json) postStart hook- The container process exits. This MAY happen due to erroring out, exiting, crashing or the runtime’s
killoperation being invoked. - Runtime’s
deletecommand is invoked with the unique identifier of the container. - The container MUST be destroyed by undoing the steps performed during create phase (step 2).
postStop hook
Operations#
Unless otherwise stated, runtimes MUST support the following operations. (These operations are not specifying any command-line APIs, and the parameters are inputs for general operations.)
Container tools provide CLI tools which may have a different name but the underlying operation should support these, that are consistent with OCI runtime-spec.
query state: This operation MUST return the state of a container as specified in statecreate: This operation MUST create a new container. Any changes made to theconfig.jsonfile after this operation will not have an effect on the container.start: This operation MUST run the user-specified program as specified byprocess.kill: This operation MUST send the specified signal to the container process.delete: Attempting to delete a container that is not stopped MUST have no effect on the container and MUST generate an error. Deleting a container MUST delete the resources that were created during thecreatestep. Note that resources associated with the container, but not created by this container, MUST NOT be deleted.- Volumes or mounts etc, are not deleted.
Configuration#
This configuration file contains metadata necessary to implement standard operations against the container. This includes the process to run, environment variables to inject, sandboxing features to use, etc.
- Refer https://github.com/opencontainers/runtime-spec/blob/main/config.md#platform-specific-configuration to the full configuration details.
Image Specification#
This specification defines an OCI Image, consisting of an image manifest, an image index (optional), a set of filesystem layers, and a configuration.
Image Manifest#
At a high level, the image manifest contains metadata about the contents and dependencies of the image, including the content-addressable identity of one or more filesystem layer changeset archives that will be unpacked to make up the final runnable filesystem.
Image Configuration#
The image configuration includes information such as application arguments, environments, etc.
Image Index#
The image index is a higher-level manifest that points to a list of manifests and descriptors. Typically, these manifests may provide different implementations of the image, possibly varying by platform or other attributes.
Content Descriptors#
- An OCI image consists of several different components arranged in a Merkle Directed Acyclic Graph (DAG).
- References between components in the graph are expressed through Content Descriptors.
- A Content Descriptor, or simply Descriptor, describes the disposition (the way in which something is placed or arranged, especially in relation to other things) of the targeted content.
- The content identifier is the digest.
- The media type defining the descriptor is:
application/vnd.oci.descriptor.v1+json
A canonical form is a representation such that every object has a unique representation. Thus, the equality of two objects can easily be tested by testing the equality of their canonical forms. Canonicalization being the process through which a representation is put into its canonical form. For example, the content {’a’:1, ‘b’:2} and {’b’:2,’a’:1} although being same can show different digests. Therefore, canonicalization is used when saving content in OCI.
echo -n {‘a’:1,‘b’:2} | sha256sum d8766531781e268ee6fe73b2333041ca231ac61f059874afe0d10c395421b388
echo -n {‘b’:2,‘a’:1} | sha256sum d644ddd8c7d5668b270da1e1d8a51a3c8b0a4c7458513a85cbf056b4414f4b65
Image Layout Specification#
- The OCI Image Layout is the directory structure for OCI content-addressable blobs and location-addressable references (refs).
Given an image layout and a ref, a tool can create an OCI Runtime Specification bundle by:
- Following the ref to find a manifest, possibly via an image index
- Applying the filesystem layers in the specified order
- Converting the image configuration into an OCI Runtime Specification
config.json
Structure#
The image layout is as follows:
blobsdirectory:- Contains content-addressable blobs
- A blob has no schema and SHOULD be considered opaque
- Directory MUST exist and MAY be empty
oci-layoutfile:- It MUST exist and be a JSON object.
- It MUST contain an
imageLayoutVersionfield
index.jsonfile- It MUST exist and be an image index JSON object.
Blobs#
- Object names in the
blobssubdirectories are composed of a directory for each hash algorithm, the children of which will contain the actual content. - The content of
blobs/<alg>/<encoded>MUST match the digest<alg>:<encoded>(referenced per descriptor). For example, the content ofblobs/sha256/da39a3ee5e6b4b0d3255bfef95601890afd80709MUST match the digestsha256:da39a3ee5e6b4b0d3255bfef95601890afd80709.
oci-layout file#
- This JSON object serves as a marker for the base of an Open Container Image Layout and to provide the version of the image-layout in use.
- The media type defining the image layout specification is:
application/vnd.oci.layout.header.v1+json
index.json file#
- It is the entry point for references and descriptors of the image layout.
- The image index is a multi-descriptor entry point.
- This index provides an established path (
/index.json) to have an entry point for an image-layout and to discover auxiliary descriptors. - In general the
mediaTypeof each descriptor object in the manifests field will be eitherapplication/vnd.oci.image.index.v1+jsonorapplication/vnd.oci.image.manifest.v1+json. - An encountered
mediaTypethat is unknown MUST NOT generate an error.
Image Index Specification#
- The image index is a higher-level manifest that points to specific image manifests, ideal for one or more platforms. While the use of an image index is OPTIONAL for image providers, image consumers SHOULD be prepared to process them.
- This section defines the
application/vnd.oci.image.index.v1+jsonmedia type.
Image Manifest Specification#
There are three main goals of the Image Manifest Specification. The media type defined by this section is application/vnd.oci.image.manifest.v1+json
- content-addressable images: by supporting an image model where the image’s configuration can be hashed to generate a unique ID for the image and its components.
- To allow multi-architecture images, through a “fat manifest” which references image manifests for platform-specific versions of an image. In OCI, this is codified in an image index.
- To be translatable to the OCI Runtime Specification.
An image manifest provides a configuration and set of layers for a single container image for a specific architecture and operating system.
Image Configuration#
- An OCI Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime.
- This specification outlines the JSON format describing images for use with a container runtime and execution tool and its relationship to filesystem changesets.
- The media type
application/vnd.oci.image.config.v1+jsondefines the image configuration.
Terminology#
Layer#
- Image filesystems are composed of layers.
- Each layer represents a set of filesystem changes in a tar-based layer format, recording files to be added, changed, or deleted relative to its parent layer.
- Layers do not have configuration metadata such as environment variables or default arguments, these are properties of the image as a whole rather than any particular layer.
- Using a layer-based or union filesystem such as AUFS, or by computing the diff from filesystem snapshots, the filesystem changeset can be used to present a series of image layers as if they were one cohesive filesystem.
- One or more layers are applied on top of each other to create a complete filesystem.
- The media type
application/vnd.oci.image.layer.v1.tar+gziprepresents anapplication/vnd.oci.image.layer.v1.tarpayload which has been compressed with gzip. - The media type
application/vnd.oci.image.layer.v1.tar+zstdrepresents anapplication/vnd.oci.image.layer.v1.tarpayload which has been compressed with zstd. - Layer Changesets for the media type
application/vnd.oci.image.layer.v1.tarMUST be packaged in tar archive.
- The media type
Change Types#
Types of changes that can occur in a changeset are:
- Additions
- Modifications
- Removals
JSON#
- Each image has an associated JSON structure that describes some basic information about the image, such as date created, author, as well as execution/runtime configuration like its entrypoint, default arguments, networking, and volumes..
- The JSON structure also references a cryptographic hash of each layer used by the image, and provides history information for those layers.
- This JSON is considered to be immutable because changing it would change the computed ImageID.
- Changing it means creating a new derived image, instead of changing the existing image.
Layer DiffID#
- A layer DiffID is the digest over the layer’s uncompressed tar archive and serialized in the descriptor digest format.
Chain ID#
- It is sometimes useful to refer to a stack of layers with a single identifier.
While a layer’s
DiffIDidentifies a single changeset, theChainIDidentifies the subsequent application of those changesets.
Image ID#
- Each image’s ID is given by the SHA256 hash of its configuration JSON.
Properties#
createdA combined date and time at which the image was createdauthorGives the name and/or email address of the person or entity that created and is responsible for maintaining the image.architectureThe CPU architecture on which the binaries in this image are built to run.osThe name of the operating system on which the image is built to run.os.versionThis property specifies the version of the operating system targeted by the referenced blob.os.featuresThis property specifies an array of strings, each specifying a mandatory OS feature.variantThe variant of the specified CPU architecture.configThe execution parameters that SHOULD be used as a base when running a container using the image.UserThe username or UID which is a platform-specific structure that allows specific control over which user the process runs as.ExposedPortsA set of ports to expose from a container running this image. Its keys can be in the format of:port/tcp,port/udp,portWith the default protocol beingtcpif not specified.EnvEntries are in the format ofVARNAME=VARVALUE. These values act as defaults and are merged with any specified when creating a container.EntrypointA list of arguments to use as the command to execute when the container starts. These values act as defaults and may be replaced by an entrypoint specified when creating a container.CmdDefault arguments to the entrypoint of the container. If anEntrypointvalue is not specified, then the first entry of theCmdarray SHOULD be interpreted as the executable to run.VolumesA set of directories describing where the process is likely to write data specific to a container instance.WorkingDirSets the current working directory of the entrypoint process in the container. This value acts as a default and may be replaced by a working directory specified when creating a container.LabelsThis field contains arbitrary metadata for the container.StopSignalThis field contains the system call signal that will be sent to the container to exit.
rootfsThe rootfs key references the layer content addresses used by the image. This makes the image config hash depend on the filesystem hash.typeMUST be set tolayers.diff_idsAn array of layer content hashes (DiffIDs), in order from first to last.
historyDescribes the history of each layer. The array is ordered from first to last.createdA combined date and time at which the layer was created.authorThe author of the build point.created_byThe command that created the layer.commentA custom message set when creating the layer.empty_layerThis field is used to mark if the history item created a filesystem diff. It is set to true if this history item doesn’t correspond to an actual layer in the rootfs section
Conversion to OCI Runtime Configuration#
When extracting an OCI Image into an OCI Runtime bundle, two orthogonal components of the extraction are relevant:
- Extraction of the root filesystem from the set of filesystem layers.
- Conversion of the image configuration blob to an OCI Runtime configuration blob.
- All the necessary system libraries and dependencies of the application are referenced as layers.
image manifest, specifies the CPU architecture for which the previous two elements are suitable. image index, which contains information about a set of images that can span a variety of architectures and operating systems
A file system is a structure used by an operating system to organise and manage files on a storage device such as a hard drive, solid state drive (SSD), or USB flash drive. It defines how data is stored, accessed, and organised on the storage device. Common File Systems:
- FAT (File Allocation Table), FAT16, FAT32
- exFAT (Extended File Allocation Table)
- NTFS (New Technology File System)
- APFS (Apple File System)
- HFS, HFS+ (Hierarchical File System)
- Ext4 (Fourth Extended File System)
BLOB stands for a “Binary Large Object,” a data type that stores binary data. Binary Large Objects (BLOBs) can be complex files like images or videos, unlike other data strings that only store letters and numbers. A BLOB will hold multimedia objects to add to a database.
An archive file stores the content of one or more computer files, possibly compressed and/or encrypted, with associated metadata such as file name, directory structure, error detection and correction information, and commentary. In computing, tar is a shell command for combining multiple computer files into a single archive file. A tarball contains metadata for the contained files including the name, ownership, timestamps, permissions and directory organization.
A changeset describes the exact differences between two successive versions in the version control system’s repository of changes.
References#
- https://spacelift.io/blog/docker-entrypoint-vs-cmd
- https://docs.docker.com/reference/dockerfile/
- https://docs.docker.com/reference/api/engine/version/v1.51/
- https://spacelift.io/blog/docker-commands-cheat-sheet
- https://spacelift.io/blog/docker-entrypoint-vs-cmd
- https://docs.docker.com/reference/dockerfile/
- https://docs.docker.com/reference/api/engine/version/v1.51/
- https://docs.docker.com/reference/cli/dockerd#description
- https://docs.docker.com/build/concepts/context/#what-is-a-build-context
- https://spacelift.io/blog/docker-commands-cheat-sheet
- https://docs.docker.com/engine/storage/
- https://docs.docker.com/engine/storage/volumes/
- https://pythonspeed.com/articles/multi-stage-docker-python/
- https://www.vmware.com/topics/virtual-machine
- https://www.redhat.com/en/topics/containers
- https://www.docker.com/resources/what-container/
- https://opencontainers.org/about/overview/
- https://github.com/opencontainers/runtime-spec
- https://github.com/opencontainers/runtime-spec/blob/main/schema/config-schema.json
