A Slalom DataOps Lab
- Create a new repository from an existing template repo.
- Clone and open the new repository on your local workstation.
- Customize the infrastructure and add credentials as needed.
- Use
terraform apply
to deploy the foundations of the core data lake environment, including three S3 buckets, the VPC, and the public and private subnets.
- One-time setup:
- Installed software via:
- Core DevOps Tools: http://docs.dataops.tk/setup
- AWS CLI (optional, for "extra credit" exercises):
choco install awscli
(Windows)brew install awscli
(Mac)
- Installed software via:
- Environment setup (each time):
- Open browser tabs:
- The lab checklist (this page)
- Linux Academy Playground - to create the "AWS Sandbox"
- slalom-ggp/dataops-project-template - starting point for your new repo
- Open browser tabs:
- Create new repo from the Slalom DataOps Template, clone repo locally and open in VS Code.
- Get AWS credentials from Linux Academy.
- Use the linux-academy link to log in to AWS in the web browser.
- In the
.secrets
folder, renameaws-credentials.template
toaws-credentials
. - Copy-paste your AWS credentials (from Linux Academy Playground) into the correct location within the template.
- In the
.secrets
folder, renameaws-secrets-manager-secrets.yml.template
toaws-secrets-manager-secrets.yml
(no addl. secrets needed in this exercise)
- Rename
infra-config-template.yml
toinfra-config.yml
- Within infra-config.yml, update your email address and your project shortname.
- Right-click the
infra
folder and select "Open in Terminal" (or from an existing terminal window, runcd infra
). - In the terminal windows that opens, run
terraform init
.- (Expected wait: 2-5 minutes while modules and providers are downloaded.)
- Optional: While you are waiting for
terraform init
to complete, you can open theinfra
folder and review the contents of these two files:00_environment.tf
01_data-lake.tf
- Once
terraform init
has completed, runterraform apply
in the same terminal window. Review the suggested changes, then type 'yes' to deploy. - Wait for
terraform apply
to complete (approx. 2 minutes). - In the web browser, browse your AWS console to S3 buckets and confirm the new data lake buckets are created.
These bonus exercises are optional.
- After completing the steps above, go to the output from
terraform apply
(or alternatively you can runterraform output
) and then copy the providedAWS User Switch
command. - Paste the
AWS User Switch
command into your terminal so that aws-cli can correctly locate your AWS credentials. - Upload infra-config.yml to the data bucket:
aws s3 cp ../infra-config.yml s3://...
- List the bucket contents with
aws s3 ls s3://...
- In the web browser, browse to the bucket and confirm the file has landed.
- Copy the contents of the airflow sample file into a new
.tf
file in the infra folder. - Review the airflow module configuration and update the
source
value in the airflow file so that it matches the source prefix for the github repo (replacing the../..
relative references).- Tip: You can use the file
01_data-lake.tf
as a sample for whatsource
should look like when referencing an external module.
- Tip: You can use the file
- Rerun terraform apply and note the error message.
- Rerun terraform init and then run terraform apply again.
- In the web browser, browse to your new airflow instance.
For troubleshooting tips, please see the Lab Troubleshooting Guide.