- Predict whether a person makes over 50K a year or not given their demographic variation.
- To achieve this, several classification techniques are explored and the random forest model yields to the best prediction result.
This project requires Python 2.7 and the following Python libraries installed:
You will also need to have software installed to run and execute a Jupyter Notebook
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer.
Template code is provided in the notebook Income_Classification.ipynb
Jupyter Notebook file.
In a terminal or command window, navigate to the top-level project directory (that contains this README) and run one of the following commands:
jupyter notebook Income_Classification.ipynb
or
ipython notebook Income_Classification.ipynb
This will open the Jupyter Notebook software and project file in your web browser.
The income dataset was extracted from 1994 U.S. Census database.
workclass
: Individual work categoryeducation
: Individual's highest education degreemarital-status
: Individual marital statusoccupation
: Individual's occupationrelationship
: Individual's relation in a familyrace
: Race of Individualsex
native-country
: Individual's native country
age
: Age of an individualeducation-num
: Individual's year of receiving educationfnlwgt
: The weights on the CPS files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau.capital-gain
capital-loss
hours-per-week
: Individual's working hour per week