-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathProject.Rmd
60 lines (43 loc) · 1.48 KB
/
Project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Predicting the manner in which exercises are done
========================================================
Load the data
```{r}
tr <- read.csv("pml-training.csv")
```
Fix numerics imported as character values, dropping the first 7 columns
, which don't represent sensor measurements, but things like time-stamps, user names
, and various other metadata about each data sample.
```{r, message = FALSE, warnings = FALSE}
tr.fix <- tr[,8:160]
tr.fix[,1:152] <- sapply(tr.fix[,1:152], paste)
suppressWarnings(tr.fix[,1:152] <- sapply(tr.fix[,1:152], as.numeric))
```
Drop columns with NAs.
```{r}
na.sums <- apply(tr.fix, 2, function(x) sum(is.na(x)))
rows <- nrow(tr.fix)
tr.fix.nona <- tr.fix[,na.sums == 0]
```
Fit a random forests model to the data.
```{r, message = FALSE, warnings = FALSE}
suppressWarnings(require(randomForest))
set.seed(1975)
fit <- randomForest(classe ~ ., data=tr.fix.nona, na.action=na.roughfix)
```
Let's take a look at the model.
```{r}
fit
```
The OOB estimate represents the expected out of sample error: **0.25%**
Now let's move to the test data and predict the manner the exercises were done.
Read and fix the data, leave NAs.
```{r, message = FALSE, warnings = FALSE}
tst <- read.csv("pml-testing.csv")
tst.fix <- tst[,8:159]
tst.fix[,1:152] <- sapply(tst.fix[,1:152], paste)
suppressWarnings(tst.fix[,1:152] <- sapply(tst.fix[,1:152], as.numeric))
```
And finally - predict the test sample classes:
```{r}
answers <- predict(fit, newdata=tst.fix); answers
```