-
Notifications
You must be signed in to change notification settings - Fork 32
/
README.Rmd
173 lines (122 loc) · 5.13 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
warning = FALSE,
message = FALSE
)
```
# tidytable <img id="logo" src="man/figures/logo.png" align="right" width="17%" height="17%" />
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/tidytable)](https://cran.r-project.org/package=tidytable)
![r-universe](https://fastverse.r-universe.dev/badges/tidytable)
[![downloads](http://cranlogs.r-pkg.org/badges/grand-total/tidytable?color=blue)](https://r-pkg.org/pkg/tidytable)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/tidytable?color=blue)](https://markfairbanks.github.io/tidytable/)
[![R-CMD-check](https://github.com/markfairbanks/tidytable/workflows/R-CMD-check/badge.svg)](https://github.com/markfairbanks/tidytable/actions)
<!-- badges: end -->
`tidytable` is a data frame manipulation library for users who need [`data.table` speed](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html) but prefer `tidyverse`-like syntax.
## Installation
Install the released version from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("tidytable")
```
Or install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("pak")
pak::pak("markfairbanks/tidytable")
```
## General syntax
`tidytable` replicates `tidyverse` syntax but uses `data.table` in the background. In general you can simply use `library(tidytable)` to replace your existing `dplyr` and `tidyr` code with `data.table` backed equivalents.
A full list of implemented functions can be found [here](https://markfairbanks.github.io/tidytable/reference/index.html).
```{r}
library(tidytable)
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
```
## Applying functions by group
You can use the normal `tidyverse` `group_by()`/`ungroup()` workflow, or you can use `.by` syntax to reduce typing. Using `.by` in a function is shorthand for `df %>% group_by() %>% some_function() %>% ungroup()`.
* A single column can be passed with `.by = z`
* Multiple columns can be passed with `.by = c(y, z)`
```{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>%
summarize(avg_z = mean(z),
.by = c(x, y))
```
All functions that can operate by group have a `.by` argument built in.
(`mutate()`, `filter()`, `summarize()`, etc.)
The above syntax is equivalent to:
```{r}
df %>%
group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
```
Both options are available for users, so you can use the syntax that you prefer.
## tidyselect support
`tidytable` allows you to select/drop columns just like you would in the tidyverse by utilizing the [`tidyselect`](https://tidyselect.r-lib.org) package in the background.
Normal selection can be mixed with all `tidyselect` helpers: `everything()`, `starts_with()`, `ends_with()`, `any_of()`, `where()`, etc.
```{r}
df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
df %>%
select(a, starts_with("b"))
```
A full overview of selection options can be found [here](https://tidyselect.r-lib.org/reference/language.html).
### Using tidyselect in `.by`
`tidyselect` helpers also work when using `.by`:
```{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>%
summarize(avg_z = mean(z),
.by = where(is.character))
```
## Tidy evaluation compatibility
Tidy evaluation can be used to write custom functions with `tidytable` functions.
The embracing shortcut `{{ }}` works, or you can use `enquo()` with `!!` if you prefer:
```{r}
df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))
add_one <- function(data, add_col) {
data %>%
mutate(new_col = {{ add_col }} + 1)
}
df %>%
add_one(x)
```
The `.data` and `.env` pronouns also work within `tidytable` functions:
```{r}
var <- 10
df %>%
mutate(new_col = .data$x + .env$var)
```
A full overview of tidy evaluation can be found [here](https://rlang.r-lib.org/reference/topic-data-mask.html).
## `dt()` helper
The `dt()` function makes regular `data.table` syntax pipeable, so you can easily mix `tidytable` syntax with `data.table` syntax:
```{r}
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
```
## Speed Comparisons
For those interested in performance, speed comparisons can be found [here](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html).
## Acknowledgements
`tidytable` is only possible because of the great contributions to R by the `data.table` and `tidyverse` teams. `data.table` is used as the main data frame engine in the background, while `tidyverse` packages like `rlang`, `vctrs`, and `tidyselect` are heavily relied upon to give users an experience similar to `dplyr` and `tidyr`.