Rancher k8s cluster setup
An experimental record
Clarification
Production
Production checklist:
Development
Development checklist:
Prerequisites
-
[IMPORTANT] Config machine name (optional):
## master sudo hostnamectl set-hostname "k8s-master" exec bash ## node1 sudo hostnamectl set-hostname "k8s-node1" exec bash ## node2 sudo hostnamectl set-hostname "k8s-node2" exec bash ## node3 sudo hostnamectl set-hostname "k8s-node3" exec bash
-
[IMPORTANT] Config
/etc/hosts
by adding master & workers' IP addresses (optional). Here we use four machines for demonstration (master for etcd & control plane, and the rest for worker. Visit Checklist for Production-Ready Clusters for more information):192.168.50.140 k8s-master 192.168.50.141 k8s-node1 192.168.50.142 k8s-node2 192.168.50.143 k8s-node3 192.168.50.144 k8s-node4
-
Set timezone:
sudo timedatectl set-timezone Asia/Shanghai
-
Turn off swap:
sudo swapoff -a
Kubectl
Skip this step if using RKE2 & Rancher, directly goto RKE2.
Install these packages on all of your machines:
kubeadm
: the command to bootstrap the clusterkubelet
: the component that runs on all of the machines in your cluster and does things like starting pods and containerskubectl
: the command line util to talk to your cluster
-
Import gpg key. This step is very import especially lacking of a proxy server:
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg
or by using tuna source:
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://mirrors.tuna.tsinghua.edu.cn/kubernetes/apt/doc/apt-key.gpg
-
Create
/etc/apt/sources.list.d/kubernetes.list
. Another important step of setting up mirrors:deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
or by using tuna source:
deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://mirrors.tuna.tsinghua.edu.cn/kubernetes/apt kubernetes-xenial main
-
Installation:
sudo apt-get update # (instead of using Rancher, manual setup cluster by kubeadm) # sudo apt-get install -y kubelet kubeadm kubectl sudo apt-get install -y kubectl
Docker
Skip this step if using RKE2 & Rancher, directly goto RKE2.
Container runtime.
-
Uninstall old versions
sudo apt-get remove docker docker-engine docker.io containerd runc
-
Update the apt package index and install packages to allow apt to use a repository over HTTPS
sudo apt-get update sudo apt-get install \ ca-certificates \ curl \ gnupg \ lsb-release
-
Add Docker’s official GPG key
sudo mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg \ --dearmor -o /etc/apt/keyrings/docker.gpg
-
Set up the repository
echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
-
Install docker engine
sudo apt-get update sudo apt-get install docker-ce \ docker-ce-cli \ containerd.io \ docker-compose-plugin
-
Create the docker group if it does not exist
sudo groupadd docker
-
Add your user to the docker group.
sudo usermod -aG docker $USER
-
Run the following command or Logout and login again and run (that doesn't work you may need to reboot your machine first)
newgrp docker
-
Enable docker start on boot
sudo systemctl enable docker.service sudo systemctl enable containerd.service
or disable:
sudo systemctl disable docker.service sudo systemctl disable containerd.service
-
Check if docker can be run without root
docker run hello-world
RKE2
RKE2, also known as RKE Government, is Rancher's next-generation Kubernetes distribution.
It is a fully conformant Kubernetes distribution that focuses on security and compliance within the U.S. Federal Government sector.
To meet these goals, RKE2 does the following:
- Provides defaults and configuration options that allow clusters to pass the CIS Kubernetes Benchmark v1.6 with minimal operator intervention
- Enables FIPS 140-2 compliance
- Regularly scans components for CVEs using trivy in our build pipeline
Note
-
[Ubuntu user skip this step] If using CentOS instead of Ubuntu: According to a known issue, config NetworkManager before install RKE2 (otherwise reboot first):
Create a config file called
rke2-canal.conf
in/etc/NetworkManger/conf.d
:[keyfile] unmanaged-devices=interface-name:cali*;interface-name:flannel*
then reload:
systemctl reload NetworkManager
Server Node
-
Switch to root user.
-
Run the installer:
curl -sfL https://get.rke2.io | sh -
-
Enable the rke2-server service:
systemctl enable rke2-server.service
-
Start the service:
systemctl start rke2-server.service
-
Follow the logs (optional):
journalctl -u rke2-server -f
IMPORTANT!
After running this installation:
- The
rke2-server
service will be installed. Therke2-server
service will be configured to automatically restart after node reboots or if the process crashes or is killed.- Additional utilities will be installed at
/var/lib/rancher/rke2/bin/
. They include:kubectl
,crictl
, andctr
. Note that these are not on your path by default.- Two cleanup scripts will be installed to the path at
/usr/local/bin/rke2
. They are:rke2-killall.sh
andrke2-uninstall.sh
.- A kubeconfig file will be written to
/etc/rancher/rke2/rke2.yaml
.- A token that can be used to register other server or agent nodes will be created at
/var/lib/rancher/rke2/server/node-token
Agent Node
-
Switch to root user.
-
Run the installer:
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
-
Enable the rke2-agent service:
systemctl enable rke2-agent.service
-
Configure the rke2-agent service:
mkdir -p /etc/rancher/rke2/ vim /etc/rancher/rke2/config.yaml
Content:
server: https://<server>:9345 token: <token from server node>
-
Start the service:
systemctl start rke2-agent.service
-
Follow the logs (optional):
journalctl -u rke2-agent -f
Cluster Access
-
Switch out from root user.
-
Copy kubeconfig by:
cp /etc/rancher/rke2/rke2.yaml ~/.kube/config
. This works on remote machine as well, all we have to do is to modifyserver
field with real IP address. -
Remove group-readable & world-readable WARNING:
chmod g-rw ~/.kube/config chmod o-r ~/.kube/config
-
Export environment variable by
vim ~/.profile
:export PATH=$PATH:/var/lib/rancher/rke2/bin/ export KUBECONFIG=~/.kube/config
and
source ~/.profile
-
Check accession:
kubectl get pods --all-namespaces helm ls --all-namespaces
Or specify the location of the kubeconfig file in the command:
kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get pods --all-namespaces helm --kubeconfig /etc/rancher/rke2/rke2.yaml ls --all-namespaces
-
Accessing the Cluster from Outside with kubectl:
Copy
/etc/rancher/rke2/rke2.yaml
on your machine located outside the cluster as~/.kube/config
. Then replace127.0.0.1
with the IP or hostname of your RKE2 server.kubectl
can now manage your RKE2 cluster.
Helm
Helm is a tool for managing packages of pre-configured Kubernetes resources. These packages are known as Helm charts.
Use Helm to:
- Find and use popular software packaged as Kubernetes charts
- Share your own applications as Kubernetes charts
- Create reproducible builds of your Kubernetes applications
- Intelligently manage your Kubernetes manifest files
- Manage releases of Helm packages
Official Helm document and Rancher's Helm document.
Installation
-
Install by script (needs a proxy server):
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 chmod 700 get_helm.sh ./get_helm.sh
otherwise, install from released source (check the latest version):
wget https://get.helm.sh/helm-v3.9.2-linux-amd64.tar.gz
Unzip and move:
tar -zxvf helm-v3.9.2-linux-amd64.tar.gz mv linux-amd64/helm /usr/local/bin/
-
Initialize a Helm Chart Repository:
helm repo add bitnami https://charts.bitnami.com/bitnami
then we can list the charts we can install:
helm search repo bitnami
Rancher
Why Rancher?
Rancher is a complete software stack for teams adopting containers. It addresses the operational and security challenges of managing multiple Kubernetes clusters across any infrastructure, while providing DevOps teams with integrated tools for running containerized workloads.
Production
-
Add the Helm chart repo:
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
-
Create a namespace for Rancher:
kubectl create namespace cattle-system
-
Install cert-manager (skip this if you have your own certificate):
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.crds.yaml
Add the Jetstack Helm repository:
helm repo add jetstack https://charts.jetstack.io
Update your local Helm chart repository cache:
helm repo update
Install the cert-manager Helm chart:
helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version v1.7.1
Check cert-manager:
kubectl get pods --namespace cert-manager
-
[Option 1] Install Rancher without CA certificate (using cert-manager, add
--set tls=external
):helm install rancher rancher-stable/rancher \ --namespace cattle-system \ --set hostname=<DNS name> \ --set bootstrapPassword=<your secret password> \ --set tls=external
-
[Option 2] Install Rancher with a CA certificate, which is recommended:
Use
kubectl
with thetls
secret type to create the secrets:kubectl -n cattle-system create secret tls tls-rancher-ingress \ --cert=tls.crt \ --key=tls.key
Check this for more detail, and updating the Rancher Certificate if needed.
Install Rancher:
helm install rancher rancher-stable/rancher \ --namespace cattle-system \ --set hostname=<DNS name> \ --set bootstrapPassword=<your secret password> \ --set ingress.tls.source=secret
-
Wait for Rancher to be rolled out:
kubectl -n cattle-system rollout status deploy/rancher
-
According to this, modify RKE2 Nginx config
/var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
:apiVersion: helm.cattle.io/v1 kind: HelmChartConfig metadata: name: rke2-ingress-nginx namespace: kube-system spec: valuesContent: |- controller: config: use-forwarded-headers: "true"
-
Verify that the Rancher server is successfully deployed:
kubectl -n cattle-system rollout status deploy/rancher
Development
This case is only for development, do not use it in production.
-
Start a rancher web server:
sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:stable
-
Visit web
https://192.168.50.140
(according to the host IP) -
Copy
docker logs <container ID> 2>&1 | grep "Bootstrap Password:"
to terminal (get container ID bydocker ps
), by executing this command we will getBootstrap Password
, and paste password back to website -
Click
☰
button at the website's top-left, thenCluster Management
, thenCreate
button on the top-right, chooseCustom
-
After enter your
Cluster Name
, keep everything default, clickNext
and inNode Options
, selectetcd
andControl Plane
for140
and the rest141
,142
and143
asWorker
.
Accessing clusters with kubectl
:
- Log into Rancher. From the Global view, open the cluster that you want to access with kubectl.
- Click
Copy KubeConfig to Clipboard
button. - Paste the contents into a new file on your local computer. Move the file to ~/.kube/config. (Note: The default location that kubectl uses for the kubeconfig file is ~/.kube/config, but you can use any directory and specify it using the --kubeconfig flag, as in this command: kubectl --kubeconfig /custom/path/kube.config get pods)
- Set global config
echo "export KUBECONFIG=~/.kube/config" >> ~/.bash_profile
andsource ~/.bash_profile
- Now we can use
kubectl version
orkubectl get nodes
to check whether configuration is successful or not
Resolutions
-
In case of no
root
user when installing RKE2, Ubuntu initializeroot
user:# init sudo passwd root # change password expire info sudo passwd -l root # switch sudo -s -H
-
Docker images mirror (optional):
execute
vim /etc/docker/daemon.json
, add:{ "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn"] }
then reload services:
sudo systemctl daemon-reload sudo systemctl restart docker
-
Clear all containers and images when deployment failed. Carefully use
sudo rm -rf ...
command if using RKE2 server/agent, because it will also unmount production services.docker stop $(docker ps -aq) docker rm $(docker ps -aq) docker system prune -f docker volume rm $(docker volume ls -q) docker image rm $(docker image ls -q) sudo rm -rf /etc/ceph \ /etc/cni \ /etc/kubernetes \ /opt/cni \ /opt/rke \ /run/secrets/kubernetes.io \ /run/calico \ /run/flannel \ /var/lib/calico \ /var/lib/etcd \ /var/lib/cni \ /var/lib/kubelet \ /var/lib/rancher/rke/log \ /var/log/containers \ /var/log/pods \ /var/run/calico
Note that calling
rm -rf ...
is very useful, when encounteretcd connection refused
problem. This usually happened when cached some previous cluster's residual files. -
rm: cannot remove '/var/lib/kubelet/pods/<pods-id>': Device or resource busy
Simply by
umount
command:sudo umount /var/lib/kubelet/pods/<pods-id>
Or
sudo reboot
then execute commands above -
Failed to bring up Etcd Plane: etcd cluster is unhealthy. solution
-
node-role.kubernetes.io/controlplane=true:NoSchedule
means no pod will be able to schedule onto this node, unless it has a matching toleration. To remove this taint:kubectl taint nodes node1 key1=value1:NoSchedule-
. In our case, this taint is normal on the master node, since we only setup one node foretcd
andcontrol plane
. Hence, no need to remove this taint. -
node-role.kubernetes.io/etcd=true:NoExecute
same as above. -
Setting wrong variables during
helm install rancher
, by upgrading:helm upgrade rancher rancher-stable/rancher \ --namespace cattle-system \ --set hostname=rancher.my.com \ --set bootstrapPassword=secret \ --set tls=external
-
To clean up Rancher, please use rancher-cleanup.