Kubernetes is a software system that allows one to easily deploy and manage containerised application on top of it. Its aim is to provide best possible utilisation of your hardware (metal and virtual) resources while still maintaining complete isolation of hosted applications.

Kubernetes abstracts away the infrastructure and enables you to run your software on thousands of nodes as if those all nodes were a single, enormous computer. Hence, deploying applications through Kubernetes is always the same, whether your cluster contains only a couple of nodes or thousands of them. The size of the cluster makes no difference at all.

Kubernetes obviates the need to implement certain infrastructure-related services into apps such as service discovery, scaling, load-balancing, self-healing and even leader election. Kubernetes manages all this for you. Developers can thus focus on implementing the actual features of the applications and not waste time figuring out how to integrate them with the infrastructure.


At a hardware level, a Kubernetes cluster is composed of nodes of two types:

  • The master node, which hosts the Kubernetes Control Plane that controls and manages the whole Kubernetes system
  • Worker nodes that run the actual applications you deploy

The Control Plane controls and maintains the state of the cluster. It consists of multiple components that can run on a single master node or be split across multiple nodes and replicated to ensure high availability. These components are:

  • The Kubernetes API Server with which you (the developer) and other Control Plane components communicate.
  • The Scheduler schedules your apps, i.e. assign a worker node to each micro-service.
  • The Controller Manager, performs cluster-level functions such as tracking inventory, scaling out, handling failures etc.
  • etcd, a distributed data store that persistently stores the cluster configuration.

The Worker Nodes (or simply nodes) are the machines that run your containerised applications. They comprise:

  • Docker, rkt or another container runtime, which runs your containers
  • The Kubelet, which talks to the API server and manages containers on its node
  • The Kubernetes Service Proxy (kube-proxy) which load-balances network traffic between application components.

Running an application in Kubernetes

  1. Package the application into one or more container images
  2. Push these images to an image registry
  3. Post a description of your application to the Kubernetes API server
  4. The Kubernetes API server processes your app’s description
  5. The Scheduler schedules the specified group of containers onto the available worker nodes.
  6. The Kubelet on those nodes then instruct the Container Runtime to pull the required container images and run the containers.

In the figure above, the app descriptor lists four containers, grouped into three sets, called pods. The two containers in the last pod need to run co-located (same worker node) and shouldn’t be isolated from each other. Next to each pod is a number indicating the number of replicas of each pod that need to run in parallel.

Upon receiving this descriptor, the Kubernetes Control Plane will schedule the specified number of replicas of each pod to the available worker nodes. The Kubelets on the nodes will then tell Docker to pull the container images from the image registry and run the containers.

Keeping services alive

Once the application is running, Kubernetes continuously makes sure that the deployed state of the application always matches the description you provided.