GitHub - javiermtzo99/data-analysis-pipeline-bash-practice

Building a Data Analysis pipeline using a shell script tutorial

This example data analysis project analyzes the word count for all words in 4 novels. It reports the top 10 most occurring words in each book in a report.

Current usage:

Set-up (first time only)

Clone this repo, and using the command line, navigate to the root of this project.
Run the following commands to create the conda environment:

conda-lock install --name da-pipeline-sh conda-lock.yml

Run the analysis

Activate the conda environment:

conda activate da-pipeline-sh

Count the words:

python scripts/wordcount.py \
    --input_file=data/isles.txt \
    --output_file=results/isles.dat
python scripts/wordcount.py \
    --input_file=data/abyss.txt \
    --output_file=results/abyss.dat
python scripts/wordcount.py \
    --input_file=data/last.txt \
    --output_file=results/last.dat
python scripts/wordcount.py \
    --input_file=data/sierra.txt \
    --output_file=results/sierra.dat

Create the plots:

python scripts/plotcount.py \
    --input_file=results/isles.dat \
    --output_file=results/figure/isles.png
python scripts/plotcount.py \
    --input_file=results/abyss.dat \
    --output_file=results/figure/abyss.png
python scripts/plotcount.py \
    --input_file=results/last.dat \
    --output_file=results/figure/last.png
python scripts/plotcount.py \
    --input_file=results/sierra.dat \
    --output_file=results/figure/sierra.png

Render the report:

quarto render report/count_report.qmd

Exercise:

Your task is to add a data analysis pipeline using a shell/bash script! It should accomplish the same task as outlined in the README.md file when you type:

bash runall.sh

Depenedencies

Quarto
Python & Python libraries:
- click
- matplotlib
- pandas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building a Data Analysis pipeline using a shell script tutorial

Current usage:

Set-up (first time only)

Run the analysis

Exercise:

Depenedencies

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
report		report
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
conda-lock.yml		conda-lock.yml
environment.yml		environment.yml

License

javiermtzo99/data-analysis-pipeline-bash-practice

Folders and files

Latest commit

History

Repository files navigation

Building a Data Analysis pipeline using a shell script tutorial

Current usage:

Set-up (first time only)

Run the analysis

Exercise:

Depenedencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages