Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Latest commit

 

History

History
280 lines (240 loc) · 9.14 KB

byoc-pod-delete.md

File metadata and controls

280 lines (240 loc) · 9.14 KB
id title sidebar_label
byoc-pod-delete
BYOC Pod Delete Experiment Details
Service Pod - Application

Experiment Metadata

Type Description Tested K8s Platform
ChaosToolKit BYOC pod delete experiment Kubeadm, Minikube

Prerequisites

  • Ensure that the Litmus ChaosOperator is running by executing kubectl get pods in operator namespace (typically, litmus). If not, install from here
  • Ensure that the k8-pod-delete experiment resource is available in the cluster by executing kubectl get chaosexperiments in the desired namespace. If not, install from here
  • Ensure you have nginx default application setup on default namespace ( if you are using specific namespace please execute below on that namespace)

Entry Criteria

  • Application replicas are healthy before chaos injection
  • Service resolution works successfully as determined by deploying a sample nginx application and a custom liveness app querying the nginx application health end point

Exit Criteria

  • Application replicas are healthy after chaos injection
  • Service resolution works successfully as determined by deploying a sample nginx application and a custom liveness app querying the nginx application health end point

Details

  • Causes graceful pod failure of application replicas using ChaosToolKit based on provided namespace and Label while doing health checks against the endpoint
  • Tests deployment sanity with steady state hypothesis executed pre and post pod failures
  • Service resolution will fail if application replicas are not present.

Use Cases for executing the experiment

Type Experiment Details json
ChaosToolKit ChaosToolKit single, random pod delete experiment with count Executing via label name app=<> pod-app-kill-count.json
ChaosToolKit ChaosToolKit single, random pod delete experiment Executing via label name app=<> pod-app-kill-health.json
ChaosToolKit ChaosToolKit single, random pod delete experiment with count Executing via Custom label name =<> pod-app-kill-count.json
ChaosToolKit ChaosToolKit single, random pod delete experiment Executing via Custom label name =<> pod-app-kill-health.json
ChaosToolKit ChaosToolKit All pod delete experiment with health validation Executing via Custom label name app=<> pod-app-kill-all.json
ChaosToolKit ChaosToolKit All pod delete experiment with health validation Executing via Custom label name =<> pod-custom-kill-all.json

Integrations

  • Pod failures can be effected using one of these chaos libraries: litmus

Steps to Execute the ChaosExperiment

  • This ChaosExperiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to provide in a ChaosEngine specification, refer Getting Started

  • Follow the steps in the sections below to create the chaosServiceAccount, prepare the ChaosEngine & execute the experiment.

Prepare chaosServiceAccount

  • Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app) namespace. This example consists of the minimum necessary role permissions to execute the experiment.

Sample RBAC Manifest

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: k8-pod-delete-sa
  namespace: default
  labels:
    name: k8-pod-delete-sa
    app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: k8-pod-delete-sa
  namespace: default
  labels:
    name: k8-pod-delete-sa
    app.kubernetes.io/part-of: litmus
rules:
- apiGroups: ["","apps","batch"]
  resources: ["jobs","deployments","daemonsets"]
  verbs: ["create","list","get","patch","delete"]
- apiGroups: ["","litmuschaos.io"]
  resources: ["pods","configmaps","events","services","chaosengines","chaosexperiments","chaosresults","deployments","jobs"]
  verbs: ["get","create","update","patch","delete","list"] 
- apiGroups: [""]
  resources: ["nodes"]
  verbs : ["get","list"] 
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: k8-pod-delete-sa
  namespace: default
  labels:
    name: k8-pod-delete-sa
    app.kubernetes.io/part-of: litmus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: k8-pod-delete-sa
subjects:
- kind: ServiceAccount
  name: k8-pod-delete-sa
  namespace: default

Note: In case of restricted systems/setup, create a PodSecurityPolicy(psp) with the required permissions. The chaosServiceAccount can subscribe to work around the respective limitations. An example of a standard psp that can be used for litmus chaos experiments can be found here.

Prepare ChaosEngine

  • Provide the application info in spec.appinfo
  • Override the experiment tunables if desired in experiments.spec.components.env
  • To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts

Supported Experiment Tunables

Variables Description Specify In ChaosEngine Notes
NAME_SPACE This is chaos namespace which will create all infra chaos resources in that namespace Mandatory Default to default
LABEL_NAME The default name of the label Mandatory Defaults to nginx
APP_ENDPOINT Endpoint where ChaosToolKit will make a call and ensure the application is healthy Mandatory Defaults to localhost
FILE Type of pod-delete chaos (in terms of steady state checks performed) we want to execute, represented by the ChaosToolKit json file Mandatory Default to `pod-app-kill-health.json`
REPORT The Report of execution coming in json format Optional Defaults to is `true`
REPORT_ENDPOINT Report endpoint which can take the json format and submit it Optional Default to setup for Kafka topic for chaos, but can support any reporting database
TEST_NAMESPACE Place holder from where the chaos experiment is executed Optional Defaults to is `default`

Sample ChaosEngine Manifest

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos-app-health
  namespace: default
spec:
  appinfo:
    appns: 'default'
    applabel: 'app=nginx'
    appkind: 'deployment'
  engineState: 'active'
  chaosServiceAccount: k8-pod-delete-sa
  experiments:
    - name: k8-pod-delete
      spec:
        components:
          env:
            # set chaos namespace
            - name: NAME_SPACE
              value: 'default'
            # set chaos label name
            - name: LABEL_NAME
              value: 'nginx'
            # pod endpoint
            - name: APP_ENDPOINT
              value: 'localhost'
            - name: FILE
              value: 'pod-app-kill-health.json'
            - name: REPORT
              value: 'true'
            - name: REPORT_ENDPOINT
              value: 'none'
            - name: TEST_NAMESPACE
              value: 'default'

Create the ChaosEngine Resource

  • Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.

    kubectl apply -f chaosengine.yml

Watch Chaos progress

  • View application pod termination & recovery by setting up a watch on the pods in the application namespace

    watch kubectl get pods

Check ChaosExperiment Result

  • Check whether the application is resilient to the ChaosToolKit pod failure, once the experiment (job) is completed. The ChaosResult resource name is derived like this: <ChaosEngine-Name>-<ChaosExperiment-Name>.

    kubectl describe chaosresult k8-pod-delete -n <chaos-namespace>

Check ChaosExperiment logs

  • Check the log and result for existing experiment

    kubectl log -f k8-pod-delete-<> -n <chaos-namespace>