Rancher k8s cluster setup

An experimental record

Jacob Xie published on July 18, 2022

12 min, 2374 words

Categories: Post

Tags: k8s

Clarification

Production

Production checklist:

Development

Development checklist:

Prerequisites

[IMPORTANT] Config machine name (optional):

## master
sudo hostnamectl set-hostname "k8s-master"
exec bash

## node1
sudo hostnamectl set-hostname "k8s-node1"
exec bash

## node2
sudo hostnamectl set-hostname "k8s-node2"
exec bash

## node3
sudo hostnamectl set-hostname "k8s-node3"
exec bash

[IMPORTANT] Config /etc/hosts by adding master & workers' IP addresses (optional). Here we use four machines for demonstration (master for etcd & control plane, and the rest for worker. Visit Checklist for Production-Ready Clusters for more information):
```
192.168.50.140 k8s-master
192.168.50.141 k8s-node1
192.168.50.142 k8s-node2
192.168.50.143 k8s-node3
192.168.50.144 k8s-node4
```

Set timezone:

sudo timedatectl set-timezone Asia/Shanghai

Turn off swap:
```
sudo swapoff -a
```

Kubectl

Skip this step if using RKE2 & Rancher, directly goto RKE2.

Install these packages on all of your machines:

~~kubeadm: the command to bootstrap the cluster~~
~~kubelet: the component that runs on all of the machines in your cluster and does things like starting pods and containers~~
kubectl: the command line util to talk to your cluster

Import gpg key. This step is very import especially lacking of a proxy server:

sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg  https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg

or by using tuna source:

sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg  https://mirrors.tuna.tsinghua.edu.cn/kubernetes/apt/doc/apt-key.gpg

Create /etc/apt/sources.list.d/kubernetes.list. Another important step of setting up mirrors:

deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main

or by using tuna source:

deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.tuna.tsinghua.edu.cn/kubernetes/apt kubernetes-xenial main

Installation:

sudo apt-get update
# (instead of using Rancher, manual setup cluster by kubeadm)
# sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-get install -y kubectl

Docker

Skip this step if using RKE2 & Rancher, directly goto RKE2.

Container runtime.

Uninstall old versions

sudo apt-get remove docker docker-engine docker.io containerd runc

Update the apt package index and install packages to allow apt to use a repository over HTTPS

sudo apt-get update
sudo apt-get install \
  ca-certificates \
  curl \
  gnupg \
  lsb-release

Add Docker’s official GPG key

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg \
   --dearmor -o /etc/apt/keyrings/docker.gpg

Set up the repository

echo \
 "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
 $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install docker engine

sudo apt-get update
sudo apt-get install docker-ce \
   docker-ce-cli \
   containerd.io \
   docker-compose-plugin

Create the docker group if it does not exist
```
sudo groupadd docker
```
Add your user to the docker group.
```
sudo usermod -aG docker $USER
```
Run the following command or Logout and login again and run (that doesn't work you may need to reboot your machine first)
```
newgrp docker
```

Enable docker start on boot

sudo systemctl enable docker.service
sudo systemctl enable containerd.service

or disable:

sudo systemctl disable docker.service
sudo systemctl disable containerd.service

Check if docker can be run without root
```
docker run hello-world
```

RKE2

RKE2, also known as RKE Government, is Rancher's next-generation Kubernetes distribution.

It is a fully conformant Kubernetes distribution that focuses on security and compliance within the U.S. Federal Government sector.

To meet these goals, RKE2 does the following:

Provides defaults and configuration options that allow clusters to pass the CIS Kubernetes Benchmark v1.6 with minimal operator intervention

Enables FIPS 140-2 compliance

Regularly scans components for CVEs using trivy in our build pipeline

Note

quick start
requirements
[Ubuntu user skip this step] If using CentOS instead of Ubuntu: According to a known issue, config NetworkManager before install RKE2 (otherwise reboot first):

Create a config file called rke2-canal.conf in /etc/NetworkManger/conf.d:
```
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:flannel*
```
then reload:
```
systemctl reload NetworkManager
```

Server Node

Switch to root user.
Run the installer: curl -sfL https://get.rke2.io | sh -
Enable the rke2-server service: systemctl enable rke2-server.service
Start the service: systemctl start rke2-server.service
Follow the logs (optional): journalctl -u rke2-server -f

IMPORTANT!

After running this installation:

The rke2-server service will be installed. The rke2-server service will be configured to automatically restart after node reboots or if the process crashes or is killed.

Additional utilities will be installed at /var/lib/rancher/rke2/bin/. They include: kubectl, crictl, and ctr. Note that these are not on your path by default.

Two cleanup scripts will be installed to the path at /usr/local/bin/rke2. They are: rke2-killall.sh and rke2-uninstall.sh.

A kubeconfig file will be written to /etc/rancher/rke2/rke2.yaml.

A token that can be used to register other server or agent nodes will be created at /var/lib/rancher/rke2/server/node-token

Agent Node

Switch to root user.
Run the installer: curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
Enable the rke2-agent service: systemctl enable rke2-agent.service

Configure the rke2-agent service:

mkdir -p /etc/rancher/rke2/
vim /etc/rancher/rke2/config.yaml

Content:

server: https://<server>:9345
token: <token from server node>

Start the service: systemctl start rke2-agent.service
Follow the logs (optional): journalctl -u rke2-agent -f

Cluster Access

Switch out from root user.
Copy kubeconfig by: cp /etc/rancher/rke2/rke2.yaml ~/.kube/config. This works on remote machine as well, all we have to do is to modify server field with real IP address.

Remove group-readable & world-readable WARNING:

chmod g-rw ~/.kube/config
chmod o-r ~/.kube/config

Export environment variable by vim ~/.profile:

export PATH=$PATH:/var/lib/rancher/rke2/bin/
export KUBECONFIG=~/.kube/config

and source ~/.profile

Check accession:

kubectl get pods --all-namespaces
helm ls --all-namespaces

Or specify the location of the kubeconfig file in the command:

kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get pods --all-namespaces
helm --kubeconfig /etc/rancher/rke2/rke2.yaml ls --all-namespaces

Accessing the Cluster from Outside with kubectl:

Copy /etc/rancher/rke2/rke2.yaml on your machine located outside the cluster as ~/.kube/config. Then replace 127.0.0.1 with the IP or hostname of your RKE2 server. kubectl can now manage your RKE2 cluster.

Helm

Helm is a tool for managing packages of pre-configured Kubernetes resources. These packages are known as Helm charts.

Use Helm to:

Find and use popular software packaged as Kubernetes charts

Share your own applications as Kubernetes charts

Create reproducible builds of your Kubernetes applications

Intelligently manage your Kubernetes manifest files

Manage releases of Helm packages

Official Helm document and Rancher's Helm document.

Installation

Install by script (needs a proxy server):

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

otherwise, install from released source (check the latest version):

wget https://get.helm.sh/helm-v3.9.2-linux-amd64.tar.gz

Unzip and move:

tar -zxvf helm-v3.9.2-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/

Initialize a Helm Chart Repository:

helm repo add bitnami https://charts.bitnami.com/bitnami

then we can list the charts we can install:

helm search repo bitnami

Rancher

Why Rancher?

Rancher is a complete software stack for teams adopting containers. It addresses the operational and security challenges of managing multiple Kubernetes clusters across any infrastructure, while providing DevOps teams with integrated tools for running containerized workloads.

Production

Install by helm

Add the Helm chart repo: helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
Create a namespace for Rancher: kubectl create namespace cattle-system

Install cert-manager (skip this if you have your own certificate):

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.crds.yaml

Add the Jetstack Helm repository:

helm repo add jetstack https://charts.jetstack.io

Update your local Helm chart repository cache:

helm repo update

Install the cert-manager Helm chart:

helm install cert-manager jetstack/cert-manager \
   --namespace cert-manager \
   --create-namespace \
   --version v1.7.1

Check cert-manager:

kubectl get pods --namespace cert-manager

[Option 1] Install Rancher without CA certificate (using cert-manager, add --set tls=external):

helm install rancher rancher-stable/rancher \
   --namespace cattle-system \
   --set hostname=<DNS name> \
   --set bootstrapPassword=<your secret password> \
   --set tls=external

[Option 2] Install Rancher with a CA certificate, which is recommended:

Use kubectl with the tls secret type to create the secrets:

kubectl -n cattle-system create secret tls tls-rancher-ingress \
   --cert=tls.crt \
   --key=tls.key

Check this for more detail, and updating the Rancher Certificate if needed.

Install Rancher:

helm install rancher rancher-stable/rancher \
   --namespace cattle-system \
   --set hostname=<DNS name> \
   --set bootstrapPassword=<your secret password> \
   --set ingress.tls.source=secret

Wait for Rancher to be rolled out:

kubectl -n cattle-system rollout status deploy/rancher

According to this, modify RKE2 Nginx config /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml:

 apiVersion: helm.cattle.io/v1
 kind: HelmChartConfig
 metadata:
   name: rke2-ingress-nginx
   namespace: kube-system
 spec:
   valuesContent: |-
     controller:
       config:
         use-forwarded-headers: "true"

Verify that the Rancher server is successfully deployed:

kubectl -n cattle-system rollout status deploy/rancher

Development

This case is only for development, do not use it in production.

Start a rancher web server:

sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:stable

Visit web https://192.168.50.140 (according to the host IP)
Copy docker logs <container ID> 2>&1 | grep "Bootstrap Password:" to terminal (get container ID by docker ps), by executing this command we will get Bootstrap Password, and paste password back to website
Click ☰ button at the website's top-left, then Cluster Management, then Create button on the top-right, choose Custom
After enter your Cluster Name, keep everything default, click Next and in Node Options, select etcd and Control Plane for 140 and the rest 141, 142 and 143 as Worker.

Accessing clusters with kubectl:

Log into Rancher. From the Global view, open the cluster that you want to access with kubectl.
Click Copy KubeConfig to Clipboard button.
Paste the contents into a new file on your local computer. Move the file to ~/.kube/config. (Note: The default location that kubectl uses for the kubeconfig file is ~/.kube/config, but you can use any directory and specify it using the --kubeconfig flag, as in this command: kubectl --kubeconfig /custom/path/kube.config get pods)
Set global config echo "export KUBECONFIG=~/.kube/config" >> ~/.bash_profile and source ~/.bash_profile
Now we can use kubectl version or kubectl get nodes to check whether configuration is successful or not

Resolutions

In case of no root user when installing RKE2, Ubuntu initialize root user:

# init
sudo passwd root

# change password expire info
sudo passwd -l root

# switch
sudo -s -H

Docker images mirror (optional):

execute vim /etc/docker/daemon.json, add:

{ "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"] }

then reload services:

sudo systemctl daemon-reload
sudo systemctl restart docker

Clear all containers and images when deployment failed. Carefully use sudo rm -rf ... command if using RKE2 server/agent, because it will also unmount production services.

docker stop $(docker ps -aq)
docker rm $(docker ps -aq)
docker system prune -f
docker volume rm $(docker volume ls -q)
docker image rm $(docker image ls -q)
sudo rm -rf /etc/ceph \
      /etc/cni \
      /etc/kubernetes \
      /opt/cni \
      /opt/rke \
      /run/secrets/kubernetes.io \
      /run/calico \
      /run/flannel \
      /var/lib/calico \
      /var/lib/etcd \
      /var/lib/cni \
      /var/lib/kubelet \
      /var/lib/rancher/rke/log \
      /var/log/containers \
      /var/log/pods \
      /var/run/calico

Note that calling rm -rf ... is very useful, when encounter etcd connection refused problem. This usually happened when cached some previous cluster's residual files.

rm: cannot remove '/var/lib/kubelet/pods/<pods-id>': Device or resource busy

Simply by umount command:
```
sudo umount /var/lib/kubelet/pods/<pods-id>
```
Or sudo reboot then execute commands above
Failed to bring up Etcd Plane: etcd cluster is unhealthy. solution
node-role.kubernetes.io/controlplane=true:NoSchedule means no pod will be able to schedule onto this node, unless it has a matching toleration. To remove this taint: kubectl taint nodes node1 key1=value1:NoSchedule-. In our case, this taint is normal on the master node, since we only setup one node for etcd and control plane. Hence, no need to remove this taint.
node-role.kubernetes.io/etcd=true:NoExecute same as above.

Setting wrong variables during helm install rancher, by upgrading:

helm upgrade rancher rancher-stable/rancher \
   --namespace cattle-system \
   --set hostname=rancher.my.com \
   --set bootstrapPassword=secret \
   --set tls=external

To clean up Rancher, please use rancher-cleanup.