Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Latest commit

 

History

History
293 lines (243 loc) · 10.8 KB

openebs-target-network-delay.md

File metadata and controls

293 lines (243 loc) · 10.8 KB
id title sidebar_label
openebs-target-network-delay
OpenEBS Target Network Latency Experiment Details
Target Network Latency

Experiment Metadata

Type Description Tested K8s Platform
OpenEBS Induce latency into the cStor target/Jiva controller container GKE, EKS, Konvoy(AWS), Packet(Kubeadm), Minikube, OpenShift(Baremetal)

Note: In this example, we are using nginx as stateful application that stores static pages on a Kubernetes volume.

Prerequisites

  • Ensure that the Kubernetes Cluster uses Docker runtime

  • Ensure that the Litmus Chaos Operator is running by executing kubectl get pods in operator namespace (typically, litmus). If not, install from here

  • Ensure that the openebs-target-network-delay experiment resource is available in the cluster. If not, install from here

  • The DATA_PERSISTENCE can be enabled by provide the application's info in a configmap volume so that the experiment can perform necessary checks. Currently, LitmusChaos supports data consistency checks only for MySQL and Busybox.

    • For MYSQL data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: openebs-target-network-delay
    data:
      parameters.yml: | 
        dbuser: root
        dbpassword: k8sDem0
        dbname: test
    
    • For Busybox data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: openebs-target-network-delay
    data:
      parameters.yml: | 
        blocksize: 4k
        blockcount: 1024
        testfile: exampleFile
    
  • Ensure that the chaosServiceAccount used for the experiment has cluster-scope permissions as the experiment may involve carrying out the chaos in the openebs namespace while performing application health checks in its respective namespace.

Entry Criteria

  • Application pods are healthy before chaos injection
  • Application writes are successful on OpenEBS PVs

Exit Criteria

  • Stateful application pods are healthy post chaos injection
  • OpenEBS Storage target pods are healthy

If the experiment tunable DATA_PERSISTENCE is set to 'enabled':

  • Application data written prior to chaos is successfully retrieved/read
  • Database consistency is maintained as per db integrity check utils

Details

  • This scenario validates the behaviour of stateful applications and OpenEBS data plane upon high latencies/network delays in accessing the storage controller pod
  • Injects latency on the specified container in the controller pod by staring a traffic control tc process with netem rules to add egress delays
  • Latency is injected via pumba library with command pumba netem delay by passing the relevant network interface, latency, chaos duration and regex filter for container name
  • Can test the stateful application's resilience to loss/slow iSCSI connections

Integrations

  • Network delay is achieved using the pumba chaos library in case of docker runtime. Support for other other runtimes via tc direct invocation of tc will be added soon.
  • The desired lib image can be configured in the env variable LIB_IMAGE.

Steps to Execute the Chaos Experiment

  • This Chaos Experiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to provide in a ChaosEngine specification, refer Getting Started

  • Follow the steps in the sections below to prepare the ChaosEngine & execute the experiment.

Prepare chaosServiceAccount

Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app)namespace. This example consists of the minimum necessary cluster role permissions to execute the experiment.

Sample Rbac Manifest

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: target-network-delay-sa
  namespace: default
  labels:
    name: target-network-delay-sa
    app.kubernetes.io/part-of: litmus
---
# Source: openebs/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: target-network-delay-sa
  labels:
    name: target-network-delay-sa
    app.kubernetes.io/part-of: litmus
rules:
- apiGroups: ["","apps","litmuschaos.io","batch","extensions","storage.k8s.io"]
  resources: ["pods","pods/exec","pods/log","events","jobs","configmaps","secrets","services","persistentvolumeclaims","storageclasses","persistentvolumes","chaosexperiments","chaosresults","chaosengines"]
  verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: target-network-delay-sa
  labels:
    name: target-network-delay-sa
    app.kubernetes.io/part-of: litmus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: target-network-delay-sa
subjects:
- kind: ServiceAccount
  name: target-network-delay-sa
  namespace: default

Prepare ChaosEngine

  • Provide the application info in spec.appinfo
  • Provide the auxiliary applications info (ns & labels) in spec.auxiliaryAppInfo
  • Override the experiment tunables if desired in experiments.spec.components.env
  • Provide the configMaps and secrets in experiments.spec.components.configMaps/secrets, For more info refer Sample ChaosEngine
  • To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts

Supported Experiment Tunables

Variables Description Type Notes
APP_PVC The PersistentVolumeClaim used by the stateful application Mandatory PVC may use either OpenEBS Jiva/cStor storage class
LIB_IMAGE The chaos library image used to inject the latency Optional Defaults to `gaiaadm/pumba:0.6.5`. Supported: `docker : gaiaadm/pumba:0.6.5`
CONTAINER_RUNTIME The container runtime used in the Kubernetes Cluster Optional Defaults to `docker`. Supported: `docker`
TARGET_CONTAINER The container into which delays are injected in the storage controller pod Optional Defaults to `cstor-istgt`
TOTAL_CHAOS_DURATION Total duration for which network latency is injected Optional Defaults to 60 seconds
DEPLOY_TYPE Type of Kubernetes resource used by the stateful application Optional Defaults to `deployment`. Supported: `deployment`, `statefulset`
TC_IMAGE Image used for traffic control in linux Optional default value is `gaiadocker/iproute2`
NETWORK_DELAY Egress delay injected into the target container Optional Defaults to 60000 milliseconds (60s)
DATA_PERSISTENCE Flag to perform data consistency checks on the application Optional Default value is disabled (empty/unset). It supports only `mysql` and `busybox`. Ensure configmap with app details are created
INSTANCE_ID A user-defined string that holds metadata/info about current run/instance of chaos. Ex: 04-05-2020-9-00. This string is appended as suffix in the chaosresult CR name. Optional Ensure that the overall length of the chaosresult CR is still < 64 characters

Sample ChaosEngine Manifest

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: target-chaos
  namespace: default
spec:
  # It can be active/stop
  engineState: 'active'
  #ex. values: ns1:name=percona,ns2:run=nginx 
  auxiliaryAppInfo: ''
  appinfo:
    appns: 'default'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: target-network-delay-sa
  experiments:
    - name: openebs-target-network-delay
      spec:
        components:
          env:
            - name: TOTAL_CHAOS_DURATION
              value: '60' # in seconds

            - name: TARGET_CONTAINER
              value: 'cstor-istgt'

            - name: APP_PVC
              value: 'demo-nginx-claim'    

            - name: DEPLOY_TYPE
              value: 'deployment'   

            - name: NETWORK_DELAY
              value: '30000'
              

Create the ChaosEngine Resource

  • Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.

    kubectl apply -f chaosengine.yml

  • If the chaos experiment is not executed, refer to the troubleshooting section to identify the root cause and fix the issues.

Watch Chaos progress

  • View network delay in action by setting up a ping to the storage controller in the OpenEBS namespace

  • Watch the behaviour of the application pod and the OpenEBS data replica/pool pods by setting up in a watch on the respective namespaces

    watch -n 1 kubectl get pods -n <application-namespace>

Check Chaos Experiment Result

  • Check whether the application is resilient to the target network delays, once the experiment (job) is completed. The ChaosResult resource naming convention is: <ChaosEngine-Name>-<ChaosExperiment-Name>.

    kubectl describe chaosresult target-chaos-openebs-target-network-delay -n <application-namespace>

Recovery

  • If the verdict of the ChaosResult is Fail, and/or the OpenEBS components do not return to healthy state post the chaos experiment, then please refer the OpenEBS troubleshooting guide for more info on how to recover the same.

OpenEBS Target Network Delay Demo [TODO]

  • A sample recording of this experiment execution is provided here.