id | title | sidebar_label |
---|---|---|
admin-mode |
Administrator Mode |
Administrator Mode |
Admin mode is one of the ways the chaos orchestration is set up in Litmus, wherein all chaos resources (i.e., install time resources like the operator, chaosexperiment CRs, chaosServiceAccount/rbac and runtime resources like chaosengine, chaos-runner, experiment jobs & chaosresults) are setup in a single admin namespace (typically, litmus). In other words, centralized administration of chaos. This feature is aimed at making the SRE/Cluster Admins life easier by doing away with setting up chaos pre-requisites on a per namespace basis (which may be more relevant in an autonomous/self-service cluster sharing model in dev environments). This mode typically needs a "wider" & "stronger" ClusterRole, albeit one that is still just a superset of the individual experiment permissions. In this mode, the applications in their respective namespaces are subjected to chaos while the chaos job runs elsewhere, i.e., admin namespace.
In order to use Admin Mode, you just have to create a ServiceAccount in the admin or so called chaos namespace (litmus
itself can be used), which is tied to a ClusterRole that has the permissions to perform operations on Kubernetes resources involved in the selected experiments across namespaces.
Provide this ServiceAccount in ChaosEngine's .spec.chaosServiceAccount.
- Select Chaos Experiment from hub.litmuschaos.io and click on
INSTALL EXPERIMENT
button.
kubectl apply -f https://hub.litmuschaos.io/api/chaos/master?file=charts/generic/pod-delete/experiment.yaml -n litmus
Here is an RBAC definition, which in essence is a superset of individual experiments RBAC that has the permissions to run all chaos experiments across different namespaces.
apiVersion: v1
kind: ServiceAccount
metadata:
name: litmus-admin
namespace: litmus
labels:
name: litmus-admin
---
# Source: openebs/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: litmus-admin
labels:
name: litmus-admin
rules:
- apiGroups: [""]
resources: ["pods","events","configmaps","secrets","services"]
verbs: ["create","delete","get","list","patch","update", "deletecollection"]
- apiGroups: [""]
resources: ["pods/exec","pods/log","pods/eviction","replicationcontrollers"]
verbs: ["get","list","create"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["apps"]
resources: ["deployments","statefulsets"]
verbs: ["list","get","patch","update"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["list","get"]
- apiGroups: ["apps"]
resources: ["daemonsets"]
verbs: ["list","get","delete"]
- apiGroups: ["apps.openshift.io"]
resources: ["deploymentconfigs"]
verbs: ["list","get"]
- apiGroups: ["argoproj.io"]
resources: ["rollouts"]
verbs: ["list","get"]
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update","delete"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["patch","get","list","update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: litmus-admin
labels:
name: litmus-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: litmus-admin
subjects:
- kind: ServiceAccount
name: litmus-admin
namespace: litmus
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: nginx-chaos
namespace: litmus #Chaos Resources Namespace
spec:
appinfo:
appns: "default" #Application Namespace
applabel: "app=nginx"
appkind: "deployment"
# It can be true/false
annotationCheck: "true"
# It can be active/stop
engineState: "active"
#ex. values: ns1:name=percona,ns2:run=nginx
auxiliaryAppInfo: ""
chaosServiceAccount: litmus-admin
# It can be delete/retain
jobCleanUpPolicy: "delete"
experiments:
- name: pod-delete
spec:
components:
env:
# set chaos duration (in sec) as desired
- name: TOTAL_CHAOS_DURATION
value: "30"
# set chaos interval (in sec) as desired
- name: CHAOS_INTERVAL
value: "10"
# pod failures without '--force' & default terminationGracePeriodSeconds
- name: FORCE
value: "false"
-
Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.
kubectl apply -f chaosengine.yml
-
Describe Chaos Engine for chaos steps.
kubectl describe chaosengine nginx-chaos -n litmus
-
View pod terminations & recovery by setting up a watch on the pods in the application namespace
watch -n 1 kubectl get pods -n default
-
Check whether the application is resilient to the pod failure, once the experiment (job) is completed. The ChaosResult resource name is derived like this:
<ChaosEngine-Name>-<ChaosExperiment-Name>
.kubectl describe chaosresult nginx-chaos-pod-delete -n litmus