Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Latest commit

 

History

History
162 lines (133 loc) · 5.36 KB

admin-mode.md

File metadata and controls

162 lines (133 loc) · 5.36 KB
id title sidebar_label
admin-mode
Administrator Mode
Administrator Mode

What is Adminstator Mode?

Admin mode is one of the ways the chaos orchestration is set up in Litmus, wherein all chaos resources (i.e., install time resources like the operator, chaosexperiment CRs, chaosServiceAccount/rbac and runtime resources like chaosengine, chaos-runner, experiment jobs & chaosresults) are setup in a single admin namespace (typically, litmus). In other words, centralized administration of chaos. This feature is aimed at making the SRE/Cluster Admins life easier by doing away with setting up chaos pre-requisites on a per namespace basis (which may be more relevant in an autonomous/self-service cluster sharing model in dev environments). This mode typically needs a "wider" & "stronger" ClusterRole, albeit one that is still just a superset of the individual experiment permissions. In this mode, the applications in their respective namespaces are subjected to chaos while the chaos job runs elsewhere, i.e., admin namespace.

How to use Adminstator Mode?

In order to use Admin Mode, you just have to create a ServiceAccount in the admin or so called chaos namespace (litmus itself can be used), which is tied to a ClusterRole that has the permissions to perform operations on Kubernetes resources involved in the selected experiments across namespaces. Provide this ServiceAccount in ChaosEngine's .spec.chaosServiceAccount.

Example

Prepare Chaos Experiment

kubectl apply -f https://hub.litmuschaos.io/api/chaos/master?file=charts/generic/pod-delete/experiment.yaml -n litmus

Prepare RBAC Manifest

Here is an RBAC definition, which in essence is a superset of individual experiments RBAC that has the permissions to run all chaos experiments across different namespaces.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: litmus-admin
  namespace: litmus
  labels:
    name: litmus-admin
---
# Source: openebs/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: litmus-admin
  labels:
    name: litmus-admin
rules:
- apiGroups: [""]
  resources: ["pods","events","configmaps","secrets","services"]
  verbs: ["create","delete","get","list","patch","update", "deletecollection"]
- apiGroups: [""]
  resources: ["pods/exec","pods/log","pods/eviction","replicationcontrollers"]
  verbs: ["get","list","create"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["apps"]
  resources: ["deployments","statefulsets"]
  verbs: ["list","get","patch","update"]
- apiGroups: ["apps"]
  resources: ["replicasets"]
  verbs: ["list","get"]
- apiGroups: ["apps"]
  resources: ["daemonsets"]
  verbs: ["list","get","delete"]
- apiGroups: ["apps.openshift.io"]
  resources: ["deploymentconfigs"]
  verbs: ["list","get"]
- apiGroups: ["argoproj.io"]
  resources: ["rollouts"]
  verbs: ["list","get"]
- apiGroups: ["litmuschaos.io"]
  resources: ["chaosengines","chaosexperiments","chaosresults"]
  verbs: ["create","list","get","patch","update","delete"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["patch","get","list","update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: litmus-admin
  labels:
    name: litmus-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: litmus-admin
subjects:
  - kind: ServiceAccount
    name: litmus-admin
    namespace: litmus

Prepare ChaosEngine

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: litmus #Chaos Resources Namespace
spec:
  appinfo:
    appns: "default" #Application Namespace
    applabel: "app=nginx"
    appkind: "deployment"
  # It can be true/false
  annotationCheck: "true"
  # It can be active/stop
  engineState: "active"
  #ex. values: ns1:name=percona,ns2:run=nginx
  auxiliaryAppInfo: ""
  chaosServiceAccount: litmus-admin
  # It can be delete/retain
  jobCleanUpPolicy: "delete"
  experiments:
    - name: pod-delete
      spec:
        components:
          env:
            # set chaos duration (in sec) as desired
            - name: TOTAL_CHAOS_DURATION
              value: "30"

            # set chaos interval (in sec) as desired
            - name: CHAOS_INTERVAL
              value: "10"

            # pod failures without '--force' & default terminationGracePeriodSeconds
            - name: FORCE
              value: "false"

Create the ChaosEngine Resource

  • Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.

    kubectl apply -f chaosengine.yml

Watch Chaos Engine

  • Describe Chaos Engine for chaos steps.

    kubectl describe chaosengine nginx-chaos -n litmus

Watch Chaos progress

  • View pod terminations & recovery by setting up a watch on the pods in the application namespace

    watch -n 1 kubectl get pods -n default

Check Chaos Experiment Result

  • Check whether the application is resilient to the pod failure, once the experiment (job) is completed. The ChaosResult resource name is derived like this: <ChaosEngine-Name>-<ChaosExperiment-Name>.

    kubectl describe chaosresult nginx-chaos-pod-delete -n litmus