Skip to content

Commit

Permalink
just some notebook cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
brifordwylie committed Mar 24, 2014
1 parent bb222ac commit 1eff46d
Showing 1 changed file with 13 additions and 9 deletions.
22 changes: 13 additions & 9 deletions pefile_classification/pefile_classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@
"source": [
"<div style=\"float: left; padding: 0px 20px 0px 20px;\"><img src=\"files/images/transformers.png\" width=\"300px\"></div>\n",
"# Data Transformation: \n",
"## Going from a list of python dictionaries to a Pandas DataFrame. Pandas has all sort of different ways to create a data frame."
"** Going from a list of python dictionaries to a Pandas DataFrame. Pandas has all sort of different ways to create a data frame. **"
]
},
{
Expand Down Expand Up @@ -636,11 +636,7 @@
"source": [
"<div style=\"float: left; margin: 0px 30px 0px 0px\"><img src=\"files/images/eyeball_2.jpg\" width=\"250px\"></div>\n",
"# Lets look at the Data\n",
"We're going to use some nice functionality in the Pandas dataframe to look at our processed data:\n",
"\n",
" - We can use groupby on the dataframe to see the different header request keys for various agents\n",
" - Transform the complicated user-agent string into something more managable (short-agent)\n",
" - Generate a 'feature vector' from the header keys"
"We're going to use some nice functionality in the Pandas dataframe to look at our processed data:"
]
},
{
Expand Down Expand Up @@ -1213,7 +1209,7 @@
"source": [
"<div style=\"float: left; padding: 0px 20px 0px 20px;\"><img src=\"files/images/transformers.png\" width=\"300px\"></div>\n",
"# Data Transformation: \n",
"## Going from a Pandas DataFrame to an X Matrix and a y vector so we can utilize all of the great scikit-learn algorithms."
"** Going from a Pandas DataFrame to an X Matrix and a y vector so we can utilize all of the great scikit-learn algorithms. **"
]
},
{
Expand Down Expand Up @@ -1312,7 +1308,7 @@
"cell_type": "code",
"collapsed": false,
"input": [
"# Now plot the results of the 80/20 split in a confusion matrix\n",
"# Now plot the results of the 60/40 split in a confusion matrix\n",
"from sklearn.metrics import confusion_matrix\n",
"labels = ['good', 'bad']\n",
"cm = confusion_matrix(y_test, y_pred, labels)\n",
Expand Down Expand Up @@ -1347,6 +1343,14 @@
],
"prompt_number": 135
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Features, predictive performance and 'knobs'\n",
"Here we going to explore some of the ways you can adjust the 'knobs' associated with either the feature input into your ML algorithm or the prediction probability methods that many classes in scikit-learn have."
]
},
{
"cell_type": "code",
"collapsed": false,
Expand All @@ -1356,7 +1360,7 @@
"no_label.remove('label')\n",
"X = df.as_matrix(no_label)\n",
"\n",
"# 80/20 Split for predictive test\n",
"# 60/40 Split for predictive test\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=my_tsize, random_state=my_seed)\n",
"clf.fit(X_train, y_train)\n",
"y_pred = clf.predict(X_test)\n",
Expand Down

0 comments on commit 1eff46d

Please sign in to comment.