Rancher k8s cluster setup

An experimental record

Jacob Xie published on
12 min, 2379 words

Categories: Post

Tags: k8s



Production checklist:


Development checklist:


  1. [IMPORTANT] Config machine name (optional):

    ## master
    sudo hostnamectl set-hostname "k8s-master"
    exec bash
    ## node1
    sudo hostnamectl set-hostname "k8s-node1"
    exec bash
    ## node2
    sudo hostnamectl set-hostname "k8s-node2"
    exec bash
    ## node3
    sudo hostnamectl set-hostname "k8s-node3"
    exec bash
  2. [IMPORTANT] Config /etc/hosts by adding master & workers' IP addresses (optional). Here we use four machines for demonstration (master for etcd & control plane, and the rest for worker. Visit Checklist for Production-Ready Clusters for more information): k8s-master k8s-node1 k8s-node2 k8s-node3 k8s-node4
  3. Set timezone:

    sudo timedatectl set-timezone Asia/Shanghai
  4. Turn off swap:

    sudo swapoff -a


Skip this step if using RKE2 & Rancher, directly goto RKE2.

Install these packages on all of your machines:

  • kubeadm: the command to bootstrap the cluster
  • kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers
  • kubectl: the command line util to talk to your cluster
  1. Import gpg key. This step is very import especially lacking of a proxy server:

    sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg  https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg

    or by using tuna source:

    sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg  https://mirrors.tuna.tsinghua.edu.cn/kubernetes/apt/doc/apt-key.gpg
  2. Create /etc/apt/sources.list.d/kubernetes.list. Another important step of setting up mirrors:

    deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main

    or by using tuna source:

    deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.tuna.tsinghua.edu.cn/kubernetes/apt kubernetes-xenial main
  3. Installation:

    sudo apt-get update
    # (instead of using Rancher, manual setup cluster by kubeadm)
    # sudo apt-get install -y kubelet kubeadm kubectl
    sudo apt-get install -y kubectl


Skip this step if using RKE2 & Rancher, directly goto RKE2.

Container runtime.

  1. Uninstall old versions

    sudo apt-get remove docker docker-engine docker.io containerd runc
  2. Update the apt package index and install packages to allow apt to use a repository over HTTPS

    sudo apt-get update
    sudo apt-get install \
      ca-certificates \
      curl \
      gnupg \
  3. Add Docker’s official GPG key

    sudo mkdir -p /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg \
       --dearmor -o /etc/apt/keyrings/docker.gpg
  4. Set up the repository

    echo \
     "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
     $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  5. Install docker engine

    sudo apt-get update
    sudo apt-get install docker-ce \
       docker-ce-cli \
       containerd.io \
  6. Create the docker group if it does not exist

    sudo groupadd docker
  7. Add your user to the docker group.

    sudo usermod -aG docker $USER
  8. Run the following command or Logout and login again and run (that doesn't work you may need to reboot your machine first)

    newgrp docker
  9. Enable docker start on boot

    sudo systemctl enable docker.service
    sudo systemctl enable containerd.service

    or disable:

    sudo systemctl disable docker.service
    sudo systemctl disable containerd.service
  10. Check if docker can be run without root

    docker run hello-world


RKE2, also known as RKE Government, is Rancher's next-generation Kubernetes distribution.

It is a fully conformant Kubernetes distribution that focuses on security and compliance within the U.S. Federal Government sector.

To meet these goals, RKE2 does the following:

  • Provides defaults and configuration options that allow clusters to pass the CIS Kubernetes Benchmark v1.6 with minimal operator intervention
  • Enables FIPS 140-2 compliance
  • Regularly scans components for CVEs using trivy in our build pipeline


  • quick start

  • requirements

  • [Ubuntu user skip this step] If using CentOS instead of Ubuntu: According to a known issue, config NetworkManager before install RKE2 (otherwise reboot first):

    Create a config file called rke2-canal.conf in /etc/NetworkManger/conf.d:


    then reload:

    systemctl reload NetworkManager

Server Node

  1. Switch to root user.

  2. Run the installer: curl -sfL https://get.rke2.io | sh -

  3. Enable the rke2-server service: systemctl enable rke2-server.service

  4. Start the service: systemctl start rke2-server.service

  5. Follow the logs (optional): journalctl -u rke2-server -f


After running this installation:

  • The rke2-server service will be installed. The rke2-server service will be configured to automatically restart after node reboots or if the process crashes or is killed.
  • Additional utilities will be installed at /var/lib/rancher/rke2/bin/. They include: kubectl, crictl, and ctr. Note that these are not on your path by default.
  • Two cleanup scripts will be installed to the path at /usr/local/bin/rke2. They are: rke2-killall.sh and rke2-uninstall.sh.
  • A kubeconfig file will be written to /etc/rancher/rke2/rke2.yaml.
  • A token that can be used to register other server or agent nodes will be created at /var/lib/rancher/rke2/server/node-token

Agent Node

  1. Switch to root user.

  2. Run the installer: curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -

  3. Enable the rke2-agent service: systemctl enable rke2-agent.service

  4. Configure the rke2-agent service:

    mkdir -p /etc/rancher/rke2/
    vim /etc/rancher/rke2/config.yaml


    server: https://<server>:9345
    token: <token from server node>
  5. Start the service: systemctl start rke2-agent.service

  6. Follow the logs (optional): journalctl -u rke2-agent -f

Cluster Access

  1. Switch out from root user.

  2. Copy kubeconfig by: cp /etc/rancher/rke2/rke2.yaml ~/.kube/config. This works on remote machine as well, all we have to do is to modify server field with real IP address.

  3. Remove group-readable & world-readable WARNING:

    chmod g-rw ~/.kube/config
    chmod o-r ~/.kube/config
  4. Export environment variable by vim ~/.profile:

    export PATH=$PATH:/var/lib/rancher/rke2/bin/
    export KUBECONFIG=~/.kube/config

    and source ~/.profile

  5. Check accession:

    kubectl get pods --all-namespaces
    helm ls --all-namespaces

    Or specify the location of the kubeconfig file in the command:

    kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get pods --all-namespaces
    helm --kubeconfig /etc/rancher/rke2/rke2.yaml ls --all-namespaces
  6. Accessing the Cluster from Outside with kubectl:

    Copy /etc/rancher/rke2/rke2.yaml on your machine located outside the cluster as ~/.kube/config. Then replace with the IP or hostname of your RKE2 server. kubectl can now manage your RKE2 cluster.


Helm is a tool for managing packages of pre-configured Kubernetes resources. These packages are known as Helm charts.

Use Helm to:

  • Find and use popular software packaged as Kubernetes charts
  • Share your own applications as Kubernetes charts
  • Create reproducible builds of your Kubernetes applications
  • Intelligently manage your Kubernetes manifest files
  • Manage releases of Helm packages

Official Helm document and Rancher's Helm document.


  1. Install by script (needs a proxy server):

    curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
    chmod 700 get_helm.sh

    otherwise, install from released source (check the latest version):

    wget https://get.helm.sh/helm-v3.9.2-linux-amd64.tar.gz

    Unzip and move:

    tar -zxvf helm-v3.9.2-linux-amd64.tar.gz
    mv linux-amd64/helm /usr/local/bin/
  2. Initialize a Helm Chart Repository:

    helm repo add bitnami https://charts.bitnami.com/bitnami

    then we can list the charts we can install:

    helm search repo bitnami


Why Rancher?

Rancher is a complete software stack for teams adopting containers. It addresses the operational and security challenges of managing multiple Kubernetes clusters across any infrastructure, while providing DevOps teams with integrated tools for running containerized workloads.


Install by helm

  1. Add the Helm chart repo: helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

  2. Create a namespace for Rancher: kubectl create namespace cattle-system

  3. Install cert-manager (skip this if you have your own certificate):

    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.crds.yaml

    Add the Jetstack Helm repository:

    helm repo add jetstack https://charts.jetstack.io

    Update your local Helm chart repository cache:

    helm repo update

    Install the cert-manager Helm chart:

    helm install cert-manager jetstack/cert-manager \
       --namespace cert-manager \
       --create-namespace \
       --version v1.7.1

    Check cert-manager:

    kubectl get pods --namespace cert-manager
  4. [Option 1] Install Rancher without CA certificate (using cert-manager, add --set tls=external):

    helm install rancher rancher-stable/rancher \
       --namespace cattle-system \
       --set hostname=<DNS name> \
       --set bootstrapPassword=<your secret password> \
       --set tls=external
  5. [Option 2] Install Rancher with a CA certificate, which is recommended:

    Use kubectl with the tls secret type to create the secrets:

    kubectl -n cattle-system create secret tls tls-rancher-ingress \
       --cert=tls.crt \

    Check this for more detail, and updating the Rancher Certificate if needed.

    Install Rancher:

    helm install rancher rancher-stable/rancher \
       --namespace cattle-system \
       --set hostname=<DNS name> \
       --set bootstrapPassword=<your secret password> \
       --set ingress.tls.source=secret
  6. Wait for Rancher to be rolled out:

    kubectl -n cattle-system rollout status deploy/rancher
  7. According to this, modify RKE2 Nginx config /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    name: rke2-ingress-nginx
    namespace: kube-system
    valuesContent: |-
         use-forwarded-headers: "true"

    and restart:

    systemctl restart
  8. Verify that the Rancher server is successfully deployed:

    kubectl -n cattle-system rollout status deploy/rancher


This case is only for development, do not use it in production.

  1. Start a rancher web server:

    sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:stable
  2. Visit web (according to the host IP)

  3. Copy docker logs <container ID> 2>&1 | grep "Bootstrap Password:" to terminal (get container ID by docker ps), by executing this command we will get Bootstrap Password, and paste password back to website

  4. Click button at the website's top-left, then Cluster Management, then Create button on the top-right, choose Custom

  5. After enter your Cluster Name, keep everything default, click Next and in Node Options, select etcd and Control Plane for 140 and the rest 141, 142 and 143 as Worker.

Accessing clusters with kubectl:

  1. Log into Rancher. From the Global view, open the cluster that you want to access with kubectl.
  2. Click Copy KubeConfig to Clipboard button.
  3. Paste the contents into a new file on your local computer. Move the file to ~/.kube/config. (Note: The default location that kubectl uses for the kubeconfig file is ~/.kube/config, but you can use any directory and specify it using the --kubeconfig flag, as in this command: kubectl --kubeconfig /custom/path/kube.config get pods)
  4. Set global config echo "export KUBECONFIG=~/.kube/config" >> ~/.bash_profile and source ~/.bash_profile
  5. Now we can use kubectl version or kubectl get nodes to check whether configuration is successful or not


  • In case of no root user when installing RKE2, Ubuntu initialize root user:

    # init
    sudo passwd root
    # change password expire info
    sudo passwd -l root
    # switch
    sudo -s -H
  • Docker images mirror (optional):

    execute vim /etc/docker/daemon.json, add:

    { "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"] }

    then reload services:

    sudo systemctl daemon-reload
    sudo systemctl restart docker
  • Clear all containers and images when deployment failed. Carefully use sudo rm -rf ... command if using RKE2 server/agent, because it will also unmount production services.

    docker stop $(docker ps -aq)
    docker rm $(docker ps -aq)
    docker system prune -f
    docker volume rm $(docker volume ls -q)
    docker image rm $(docker image ls -q)
    sudo rm -rf /etc/ceph \
          /etc/cni \
          /etc/kubernetes \
          /opt/cni \
          /opt/rke \
          /run/secrets/kubernetes.io \
          /run/calico \
          /run/flannel \
          /var/lib/calico \
          /var/lib/etcd \
          /var/lib/cni \
          /var/lib/kubelet \
          /var/lib/rancher/rke/log \
          /var/log/containers \
          /var/log/pods \

    Note that calling rm -rf ... is very useful, when encounter etcd connection refused problem. This usually happened when cached some previous cluster's residual files.

  • rm: cannot remove '/var/lib/kubelet/pods/<pods-id>': Device or resource busy

    Simply by umount command:

    sudo umount /var/lib/kubelet/pods/<pods-id>

    Or sudo reboot then execute commands above

  • Failed to bring up Etcd Plane: etcd cluster is unhealthy. solution

  • node-role.kubernetes.io/controlplane=true:NoSchedule means no pod will be able to schedule onto this node, unless it has a matching toleration. To remove this taint: kubectl taint nodes node1 key1=value1:NoSchedule-. In our case, this taint is normal on the master node, since we only setup one node for etcd and control plane. Hence, no need to remove this taint.

  • node-role.kubernetes.io/etcd=true:NoExecute same as above.

  • Setting wrong variables during helm install rancher, by upgrading:

    helm upgrade rancher rancher-stable/rancher \
       --namespace cattle-system \
       --set hostname=rancher.my.com \
       --set bootstrapPassword=secret \
       --set tls=external
  • To clean up Rancher, please use rancher-cleanup.