id | title | sidebar_label |
---|---|---|
openebs-pool-disk-loss |
OpenEBS Pool Disk Loss Experiment Details |
Pool Disk Loss |
Type | Description | Tested K8s Platform |
---|---|---|
OpenEBS | OpenEBS Pool Disk Loss contains chaos to disrupt state of infra resources. Experiments can inject disk loss against OpenEBS pool. | GKE, AWS (KOPS) |
-
Ensure that the Litmus Chaos Operator is running by executing
kubectl get pods
in operator namespace (typically,litmus
). If not, install from here -
Ensure that the
openebs-pool-disk-loss
experiment resource is available in the cluster by executingkubectl get chaosexperiments
in the specificed namespace. If not, install from here -
The DATA_PERSISTENCE can be enabled by provide the application's info in a configmap volume so that the experiment can perform necessary checks. Currently, LitmusChaos supports data consistency checks only for
MySQL
andBusybox
. -
For MYSQL data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
---
apiVersion: v1
kind: ConfigMap
metadata:
name: openebs-pool-disk-loss
data:
parameters.yml: |
dbuser: root
dbpassword: k8sDem0
dbname: test
- For Busybox data persistence check create a configmap as shown below in the application namespace (replace with actual credentials):
---
apiVersion: v1
kind: ConfigMap
metadata:
name: openebs-pool-disk-loss
data:
parameters.yml: |
blocksize: 4k
blockcount: 1024
testfile: exampleFile
- There should be administrative access to the platform on which the cluster is hosted, as the recovery of the affected node could be manual. Example gcloud access to the project
apiVersion: v1
kind: Secret
metadata:
name: cloud-secret
type: Opaque
stringData:
cloud_config.yml: |-
# Add the cloud AWS credentials or GCP service account respectively
- Ensure that the chaosServiceAccount used for the experiment has cluster-scope permissions as the experiment may involve carrying out the chaos in the
openebs
namespace while performing application health checks in its respective namespace.
- Application pods are healthy before chaos injection
- Application writes are successful on OpenEBS PVs
- The pool disk is healthy before chaos injection
- Application pods are healthy post chaos injection
- OpenEBS Storage pool pods are healthy
- The disk is healthy after chaos injection
If the experiment tunable DATA_PERSISTENCE is set to 'mysql' or 'busybox':
- Application data written prior to chaos is successfully retrieved/read
- Database consistency is maintained as per db integrity check utils
- This scenario validates the behaviour of stateful applications and OpenEBS disk pool upon disk loss.
- Injects disk loss on the specified OpenEBS disk pool and node pool
- Can test the stateful application's resilience to disk loss
- Disk loss is achieved using the
litmus
chaos library
-
This Chaos Experiment can be triggered by creating a ChaosEngine resource on the cluster. To understand the values to provide in a ChaosEngine specification, refer Getting Started
-
Follow the steps in the sections below to create the chaosServiceAccount, prepare the ChaosEngine & execute the experiment.
Use this sample RBAC manifest to create a chaosServiceAccount in the desired (app)namespace. This example consists of the minimum necessary cluster role permissions to execute the experiment.
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: pool-disk-loss-sa
namespace: default
labels:
name: pool-disk-loss-sa
app.kubernetes.io/part-of: litmus
---
# Source: openebs/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pool-disk-loss-sa
labels:
name: pool-disk-loss-sa
app.kubernetes.io/part-of: litmus
rules:
- apiGroups: ["","apps","litmuschaos.io","batch","extensions","storage.k8s.io","openebs.io"]
resources: ["pods", "pods/log", "jobs", "events", "pods/exec", "cstorpools", "configmaps", "secrets", "storageclasses", "persistentvolumes", "persistentvolumeclaims", "cstorvolumereplicas", "chaosexperiments", "chaosresults", "chaosengines"]
verbs: ["create","list","get","patch","update","delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: pool-disk-loss-sa
labels:
name: pool-disk-loss-sa
app.kubernetes.io/part-of: litmus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: pool-disk-loss-sa
subjects:
- kind: ServiceAccount
name: pool-disk-loss-sa
namespace: default
- Provide the application info in
spec.appinfo
- Provide the auxiliary applications info (ns & labels) in
spec.auxiliaryAppInfo
- Override the experiment tunables if desired in
experiments.spec.components.env
- Provide the configMaps and secrets in
experiments.spec.components.configMaps/secrets
, For more info refer Sample ChaosEngine - To understand the values to provided in a ChaosEngine specification, refer ChaosEngine Concepts
Variables | Description | Specify In ChaosEngine | Notes |
---|---|---|---|
APP_PVC | The PersistentVolumeClaim used by the stateful application | Mandatory | Corresponds to the PVC using OpenEBS cStor storage class |
CLOUD_PLATFORM | Cloud Platform name | Mandatory | Supported platforms: GKE, AWS |
PROJECT_ID | GCP project ID, leave blank if it's AWS | Mandatory | |
NODE_NAME | Node name of the cluster | Mandatory | |
DISK_NAME | Name of external/cloud disk attached of the node | Mandatory | |
DEVICE_NAME | Enter the device name which you wanted to mount. Applies only to AWS. | Mandatory | |
ZONE_NAME | Zone Name for GCP and region name for AWS | Mandatory | Note: Use REGION_NAME for AWS |
TOTAL_CHAOS_DURATION | Total duration for which disk loss is injected | Optional | Defaults to 60 seconds |
DATA_PERSISTENCE | Flag to perform data consistency checks on the application | Optional | Default value is disabled (empty/unset). It supports only `mysql` and `busybox`. Ensure configmap with app details are created |
APP_CHECK | If it checks to true, the experiment will check the status of the application. | Optional | |
RAMP_TIME | Period to wait before and after injection of chaos in sec | Optional | |
OPENEBS_NAMESPACE | Namespace in which OpenEBS pods are deployed | Optional | |
INSTANCE_ID | A user-defined string that holds metadata/info about current run/instance of chaos. Ex: 04-05-2020-9-00. This string is appended as suffix in the chaosresult CR name. | Optional | Ensure that the overall length of the chaosresult CR is still < 64 characters |
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: pool-chaos
namespace: default
spec:
# It can be active/stop
engineState: 'active'
#ex. values: ns1:name=percona,ns2:run=busybox
auxiliaryAppInfo: ''
appinfo:
appns: 'default'
applabel: 'app=nginx'
appkind: 'deployment'
chaosServiceAccount: pool-disk-loss-sa
experiments:
- name: openebs-pool-disk-loss
spec:
components:
env:
# provide the total chaos duration
- name: TOTAL_CHAOS_DURATION
value: '60'
- name: APP_PVC
value: 'demo-nginx-claim'
# GKE and AWS supported
- name: CLOUD_PLATFORM
value: 'GKE'
# Enter the project id for gcp only
- name: PROJECT_ID
value: 'litmus-demo-123'
# Enter the node name
- name: NODE_NAME
value: 'demo-node-123'
# Enter the disk name
- name: DISK_NAME
value: 'demo-disk-123 '
# Enter the device name
- name: DEVICE_NAME
value: '/dev/sdb'
# Enter the zone name
- name: ZONE_NAME
value: 'us-central1-a'
-
Create the ChaosEngine manifest prepared in the previous step to trigger the Chaos.
kubectl apply -f chaosengine.yml
-
If the chaos experiment is not executed, refer to the troubleshooting section to identify the root cause and fix the issues.
-
Watch the behaviour of the application pod and the OpenEBS data replica/pool pods by setting up a watch on the respective namespaces
watch -n 1 kubectl get pods -n <application-namespace>
watch -n 1 kubectl get pods -n <openebs-namespace>
-
Check whether the application is resilient to the pool disk loss, once the experiment (job) is completed. The ChaosResult resource naming convention is:
<ChaosEngine-Name>-<ChaosExperiment-Name>
.kubectl describe chaosresult pool-chaos-openebs-pool-disk-loss -n <application-namespace>
- A sample recording of this experiment execution is provided here.