README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  warning = FALSE,
  message = FALSE
)
```

# tidytable  <img id="logo" src="man/figures/logo.png" align="right" width="17%" height="17%" /> 

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/tidytable)](https://cran.r-project.org/package=tidytable)
![r-universe](https://fastverse.r-universe.dev/badges/tidytable)
[![downloads](http://cranlogs.r-pkg.org/badges/grand-total/tidytable?color=blue)](https://r-pkg.org/pkg/tidytable)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/tidytable?color=blue)](https://markfairbanks.github.io/tidytable/)
[![R-CMD-check](https://github.com/markfairbanks/tidytable/workflows/R-CMD-check/badge.svg)](https://github.com/markfairbanks/tidytable/actions)
<!-- badges: end -->

`tidytable` is a data frame manipulation library for users who need [`data.table` speed](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html) but prefer `tidyverse`-like syntax.

## Installation

Install the released version from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("tidytable")
```

Or install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("pak")
pak::pak("markfairbanks/tidytable")
```

## General syntax

`tidytable` replicates `tidyverse` syntax but uses `data.table` in the background. In general you can simply use `library(tidytable)` to replace your existing `dplyr` and `tidyr` code with `data.table` backed equivalents.

A full list of implemented functions can be found [here](https://markfairbanks.github.io/tidytable/reference/index.html).

```{r}
library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
```

## Applying functions by group

You can use the normal `tidyverse` `group_by()`/`ungroup()` workflow, or you can use `.by` syntax to reduce typing. Using `.by` in a function is shorthand for `df %>% group_by() %>% some_function() %>% ungroup()`.

* A single column can be passed with `.by = z`
* Multiple columns can be passed with `.by = c(y, z)`

```{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
```

All functions that can operate by group have a `.by` argument built in.
(`mutate()`, `filter()`, `summarize()`, etc.)

The above syntax is equivalent to:

```{r}
df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
```

Both options are available for users, so you can use the syntax that you prefer.

## tidyselect support

`tidytable` allows you to select/drop columns just like you would in the tidyverse by utilizing the [`tidyselect`](https://tidyselect.r-lib.org) package in the background.

Normal selection can be mixed with all `tidyselect` helpers: `everything()`, `starts_with()`, `ends_with()`, `any_of()`, `where()`, etc.

```{r}
df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
```

A full overview of selection options can be found [here](https://tidyselect.r-lib.org/reference/language.html).

### Using tidyselect in `.by`

`tidyselect` helpers also work when using `.by`:

```{r}
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
```

## Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with `tidytable` functions.
The embracing shortcut `{{ }}` works, or you can use `enquo()` with `!!` if you prefer:

```{r}
df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
```

The `.data` and `.env` pronouns also work within `tidytable` functions:

```{r}
var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
```

A full overview of tidy evaluation can be found [here](https://rlang.r-lib.org/reference/topic-data-mask.html).

## `dt()` helper

The `dt()` function makes regular `data.table` syntax pipeable, so you can easily mix `tidytable` syntax with `data.table` syntax:

```{r}
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
```

## Speed Comparisons

For those interested in performance, speed comparisons can be found [here](https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html).

## Acknowledgements
`tidytable` is only possible because of the great contributions to R by the `data.table` and `tidyverse` teams. `data.table` is used as the main data frame engine in the background, while `tidyverse` packages like `rlang`, `vctrs`, and `tidyselect` are heavily relied upon to give users an experience similar to `dplyr` and `tidyr`.