The Bike Rental Prediction Model (ONGOING PROJECT) aims to create a model using historical data and Scikit-Learn's RandomForestRegressor that forecasts daily bike rental demand based on various factors. Helping the business optimize staffing, inventory management, and revenue.
- Programming Languages: Python
- Libraries/Frameworks:
- Data Manipulation & Analysis: Pandas, NumPy
- Data Visualization: Matplotlib, Seaborn
- Machine Learning: Scikit-Learn (RandomForestRegressor)
- Data Preprocessing: Scikit-Learn, Pandas
- Model Export: Joblib
- Model Evaluation: Scikit-Learn (MSE, R-Squared)
- Data Storage: CSV files
- Version Control: Git
- Development Environment: VSCode
-
Data Collection:
- Download data from the following sources:
- Original Source: http://capitalbikeshare.com/system-data
- Weather Info: http://www.freemeteo.com
- Holiday Schedule: http://dchr.dc.gov/page/holiday-schedule
- Data Downloaded for Model Development: https://www.kaggle.com/datasets/marklvl/bike-sharing-dataset?select=hour.csv
- Download data from the following sources:
-
Exploratory Data Analysis (EDA):
- Perform EDA to explore the distribution of bike rentals.
- Visualize correlations and trends using Matplotlib and Seaborn.
- Identify key features impacting bike rentals, such as weather, holidays, and seasonal changes.
-
Model Development:
- Split the data into training and testing sets.
- Train the RandomForestRegressor model and tune hyperparameters.
- Evaluate model performance using metrics like Mean Squared Error (MSE) and R-squared.
-
Save Model for Future Use:
- Use
joblib
to save the model for future deployment and predictions.
- Use
-
Documentation and Reporting:
- Document the entire process, from data sources to model details.
- Provide visualizations for actual vs. predicted values.
-
Future Work:
- Update the dataset periodically and test the model with new data.
- Experiment with different models and hyperparameters for better performance.
- Consider real-time predictions in a deployment scenario.
- Data Sources: Original bike-sharing data, weather data, and holiday schedules were used to build this model.
- Model Updates: The model is designed for future improvements, such as using a more up-to-date dataset and experimenting with different models.
- Deployment: The model is exportable for deployment in production environments using
joblib
. - Evaluation: Metrics such as MSE and R-squared were used to measure the model's performance.
- Maintenance: Future work will include refining the model with additional data and testing deployment scenarios for real-time predictions.