Gotchas

Mistakes I have made. And made. And made again. Expect this page to grow as I (re)discover more gotchas.

The `magrittr` dot tension

The tidyverse takes its dot . pronoun from the magrittr package. It means “the thing we are operating on” and is also known as the “argument placeholder”.

You don’t need the dot when you’re using pipe-friendly functions and the planets align for you:

8 %>% log2()
#> [1] 3
## is same as
log2(8)
#> [1] 3

But sometimes the thing you’re passing into the right-hand side (RHS) is not the first argument:

2 %>% log(8)
#> [1] 0.3333333
## is not what I want and is not the same as
2 %>% log(8, .)
#> [1] 3
## or 
2 %>% log(8, base = .)
#> [1] 3

And sometimes you want to prevent the left-hand side from being used as the (invisible) first argument on the RHS. So you have to enclose RHS in curly braces:

iris %>% {
  c(rows = nrow(.), cols = ncol(.))
}
#> rows cols 
#>  150    5

One last thing … and this leads to the gotcha. The . can also be used to create a unary function:

att <- . %>% toupper() %>% paste("ALL THE THINGS!")
"open source" %>% att()
#> [1] "OPEN SOURCE ALL THE THINGS!"
"butter" %>% att()
#> [1] "BUTTER ALL THE THINGS!"
"teach" %>% att()
#> [1] "TEACH ALL THE THINGS!"

What is att anyway?

att
#> Functional sequence with the following components:
#> 
#>  1. toupper(.)
#>  2. paste(., "ALL THE THINGS!")
#> 
#> Use 'functions' to extract the individual functions.

It is a “functional sequence”.

It’s fairly easy to write code where you think . is a placeholder, but it generates a functional sequence.

Watch me.

library(purrr)
library(tibble)

x <- list(list(int = 1L, chr = 'a'), list(int = 2L, chr = 'b'))
  
## YES GOOD WORKS
x %>% {
  tibble(id = map_int(., "int"),
         chr = map_chr(., "chr"))
}
#> # A tibble: 2 x 2
#>      id chr  
#>   <int> <chr>
#> 1     1 a    
#> 2     2 b

## NO BAD DOES NOT WORK
x %>% {
  tibble(id = . %>% map_int("int"),
         chr = . %>% map_chr("chr"))
}
#> Error: Columns `id`, `chr` must be 1d atomic vectors or lists

What went wrong?

. %>% map_int("int") built a unary function, instead of passing x into map_int(). Do not start a pipeline with . unless you want a unary function.

What does this have to do with purrr?

If you’ve got a complicated object x (e.g., a deeply nested list from JSON), you might build a data frame with repeated calls to map_*() functions. Be careful where you put your dot .!

`purrr` is strict about types

purrr’s type checking is very strict, which is overhwhelmingly positive. But it will force you to be more aware of integer vs. double.

set.seed(4561)
(x <- sample(1:5))
#> [1] 1 4 2 5 3

times_two <- function(x) x * 2
times_two(x)
#> [1]  2  8  4 10  6

x_list <- as.list(x)

## WTF?
x_list %>% 
  map_int(times_two)
#> Error: Can't coerce element 1 from a double to a integer

Why can I suddenly not multiply these numbers by 2?

Because we’ve said to expect integers back and, though the elements of x are integer, the result of multiplying by the double 2 is double.

What can you do? Buckle down and make sure that integer stays integer, if that’s appropriate. Or loosen up and use map_dbl() instead.

## GOOD, in the buckle down sense
times_two <- function(x) x * 2L
x_list %>% 
  map_int(times_two)
#> [1]  2  8  4 10  6

## GOOD, in the loosen up sense
times_two <- function(x) x * 2
x_list %>% 
  map_dbl(times_two)
#> [1]  2  8  4 10  6

Gotchas

The magrittr dot tension

purrr is strict about types

The `magrittr` dot tension

`purrr` is strict about types