Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Latest commit

 

History

History
240 lines (189 loc) · 8 KB

openebs-nfs-provisioner-kill.md

File metadata and controls

240 lines (189 loc) · 8 KB
id title sidebar_label
openebs-nfs-provisioner-kill
OpenEBS NFS Provisioner Kill Chaos Experiment Details
NFS Provisioner Kill

Experiment Metadata

Type Description Tested K8s Platform
OpenEBS Kill the OpenEBS NFS provisioner container and check if pods consuming the NFS PVs continue to be available and volumes are writable (RWM mode). GKE

Prerequisites

  • Ensure that the Litmus Chaos Operator is running by executing kubectl get pods in operator namespace (typically, litmus). If not, install from here

  • Ensure that the openebs-nfs-provisioner-kill experiment resource is available in the cluster by executing kubectl get chaosexperiments in the openebs namespace. If not, install from here

  • The "DATA_PERSISTENCE" env variable takes effect only if the "EXTERNAL_APP_CHECK" is enabled. A stateful busybox deployment is used to create and validate data persistence on the RMW NFS persistent volumes.

The data persistence also needs the users to create a minimal configmap upfront (like described below) in the namespace where the experiment resources are created.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: openebs-nfs-provisioner-kill
data:
  parameters.yml: | 
    blocksize: 4k
    blockcount: 1024
    testfile: exampleFile

Entry Criteria

  • OpenEBS NFS provisioner pod(s) are healthy before chaos injection
  • Pods using NFS PVs in RWM mode are healthy before chaos injection

Exit Criteria

  • OpenEBS NFS provisioner pod(s) are healthy after chaos injection
  • Pods using NFS PVs in RWM mode are healthy after chaos injection

Details

  • This scenario verifies application behaviour upon crash/failure of OpenEBS NFS provisioner container, when the NFS provisioner is backed by OpenEBS PVs

Integrations

  • Container kill is achieved using the pumba or containerd_chaos chaos library
  • The desired pumba and containerd image can be configured in the env variable LIB_IMAGE.

Steps to Execute the Chaos Experiment

  • This Chaos Experiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to be provided in a ChaosEngine specification, refer Getting Started

  • Follow the steps in the sections below to create the chaosServiceAccount, prepare the ChaosEngine & execute the experiment.

Prepare chaosServiceAccount

Use this sample RBAC manifest to create a chaosServiceAccount in the desired namespace. This example consists of the minimum necessary role permissions to execute the experiment.

Sample Rbac Manifest

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-chaos-sa
  namespace: default
  labels:
    name: nfs-chaos-sa
    app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nfs-chaos-sa
  labels:
    name: nfs-chaos-sa
    app.kubernetes.io/part-of: litmus
rules:
- apiGroups: ["","apps","litmuschaos.io","batch","extensions","storage.k8s.io"]
  resources: ["pods","pods/exec","pods/log", "deployments","events","jobs","configmaps","services","persistentvolumeclaims","storageclasses","persistentvolumes","chaosexperiments","chaosresults","chaosengines"]
  verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: nfs-chaos-sa
  labels:
    name: nfs-chaos-sa
    app.kubernetes.io/part-of: litmus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: nfs-chaos-sa
subjects:
- kind: ServiceAccount
  name: nfs-chaos-sa
  namespace: default

Prepare ChaosEngine

  • Provide the application info in spec.appinfo
  • Override the experiment tunables if desired in experiments.spec.components.env
  • To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts

Supported Experiment Tunables

Variables Description Specify In ChaosEngine Notes
TARGET_CONTAINER The container to be killed inside the pod Mandatory If the TARGET_CONTAINER is not provided it will delete the first container specified/listed in the deployment spec
LIB_IMAGE The pumba/containerd image used to run the kill command Optional Defaults to `gaiaadm/pumba:0.6.5`, for containerd runtime use `gprasath/crictl:ci`
LIB The category of lib use to inject chaos Optional It can be pumba or containerd
RAMP_TIME Period to wait before and after injection of chaos in sec Optional
INSTANCE_ID A user-defined string that holds metadata/info about current run/instance of chaos. Ex: 04-05-2020-9-00. This string is appended as suffix in the chaosresult CR name. Optional Ensure that the overall length of the chaosresult CR is still < 64 characters

Sample ChaosEngine Manifest

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nfs-chaos
  namespace: default
spec:
  # It can be active/stop
  engineState: 'active'
  appinfo:
    appns: 'minio'
    applabel: 'app=minio'
    appkind: 'deployment'
  chaosServiceAccount: nfs-chaos-sa
  experiments:
    - name: openebs-nfs-provisioner-kill
      spec:
        components:
          env:
            # provide the total chaos duration
            - name: TOTAL_CHAOS_DURATION
              value: '20'

            - name: NFS_PROVISIONER_NAMESPACE
              value: 'app-nfs-ns'
  
            - name: NFS_PROVISIONER_LABEL
              value: 'app=nfs'

            - name: NFS_PVC
              value: 'nfs-pvc-claim'
            
            - name: NFS_SVC
              value: 'nfs-provisioner'

            - name: TARGET_CONTAINER
              value: 'nfs-provisioner'

            # EXTERNAL_APP_CHECK can be true/false
            - name: EXTERNAL_APP_CHECK
              value: 'true'

Create the ChaosEngine Resource

  • Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.

    kubectl apply -f chaosengine.yml -n <application-namespace>

  • If the chaos experiment is not executed, refer to the troubleshooting section to identify the root cause and fix the issues.

Watch Chaos progress

  • View pod terminations by setting up a watch on the pods in the application namespace

    watch -n 1 kubectl get pods -n -n <application-namespace>

Check Chaos Experiment Result

  • Check whether the OpenEBS NFS pod is resilient to the container kill, once the experiment (job) is completed. The ChaosResult resource naming convention is: <ChaosEngine-Name>-<ChaosExperiment-Name>.

    kubectl describe chaosresult nfs-chaos-openebs-nfs-provisioner-kill -n <application-namespace>

Recovery

  • If the verdict of the ChaosResult is Fail, and/or the OpenEBS components do not return to healthy state post the chaos experiment, then please refer the OpenEBS troubleshooting guide for more info on how to recover the same.

OpenEBS NFS KILL Chaos Demo [TODO]

  • A sample recording of this experiment execution is provided here.