Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Pedro-A-D-S committed Jul 21, 2023
2 parents afaf634 + 7d188cd commit bbb6770
Show file tree
Hide file tree
Showing 19 changed files with 869 additions and 464 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
concrete/*
data/*
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ pip install src/requirements.txt
```
3. Execute the notebooks in the specified order, ensuring that the dataset and necessary files are correctly referenced.


## Dask as the Runtime Engine

In the data processing step, we have incorporated **Dask**, a parallel computing library, to handle large-scale data efficiently. With Dask, we can process data in a distributed manner, allowing us to scale our computation to multiple cores and machines seamlessly. The use of Dask enables us to leverage the power of parallel processing, making our data processing pipelines faster and more scalable.

Feel free to experiment with different regression algorithms and hyperparameter tuning to further enhance the model performance. Share your feedback and contribute to this project to help us improve and expand its capabilities.

## MLflow Integration
Expand All @@ -71,4 +76,4 @@ For inquiries or further information, please contact me at:
- LinkedIn: https://www.linkedin.com/in/pedro-a-d-s/

## License
This project is licensed under the MIT License.
This project is licensed under the MIT License.
Binary file modified data/1-bronze/Concrete_Data_Cleaned.parquet
Binary file not shown.
11 changes: 1 addition & 10 deletions notebooks/01-EDA/01_EDA_strength_prediction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -81,11 +81,9 @@
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import itertools\n",
"from scipy import stats\n",
"%matplotlib inline\n",
"sns.set_style(\"white\")"
Expand Down Expand Up @@ -184,7 +182,7 @@
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('../../data/0. raw-data/Concrete_Data.csv')"
"df = pd.read_csv('../../data/0-raw-data/Concrete_Data.csv')"
]
},
{
Expand Down Expand Up @@ -2567,13 +2565,6 @@
"source": [
"df.to_parquet('../../data/2-silver/Concrete_Data_Cleaned.parquet', index = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
Loading

0 comments on commit bbb6770

Please sign in to comment.