Load packages

Load purrr and repurrrsive, which contains recursive list examples. If you’re just jumping here, the example datasets are introduced elsewhere, including via interactive listviewer widgets.

library(purrr)
library(repurrrsive)

map() overview

Recall the usage of purrr’s core map() function:

map(.x, .f, ...)
map(VECTOR_OR_LIST_INPUT, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)

You can provide further arguments via ..., but you don’t have to. The above expands to something like this:

res <- vector(mode = "list", length = length(.x))
res[[1]] <- .f(.x[[1]], ...)
res[[2]] <- .f(.x[[2]], ...)
## and so on, until the end of .x
res

Note that any additional arguments provided via ... are used “as is” in each call to .f. In other words, map() is not vectorized over these arguments. If you need that, check out map2(), pmap(), and friends.

map() function specification

One of the main reasons to use purrr is the flexible and concise syntax for specifying .f, the function to apply.

The shortcuts for extracting by name and position are covered thoroughly elsewhere and won’t be repeated here.

We demonstrate three more ways to specify general .f:

  • an existing function
  • an anonymous function, defined on-the-fly, as usual
  • a formula: this is unique to purrr and provides a very concise way to define an anonymous function

We work with the Game of Thrones character list, got_chars. Each character can have aliases, which are stored in a vector in each character’s component. We pull out the aliases for three characters to use as our demo.

aliases <- set_names(map(got_chars, "aliases"), map_chr(got_chars, "name"))
(aliases <- aliases[c("Theon Greyjoy", "Asha Greyjoy", "Brienne of Tarth")])
#> $`Theon Greyjoy`
#> [1] "Prince of Fools" "Theon Turncloak" "Reek"            "Theon Kinslayer"
#> 
#> $`Asha Greyjoy`
#> [1] "Esgred"                "The Kraken's Daughter"
#> 
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth"  "Brienne the Beauty" "Brienne the Blue"

Existing function

Use a pre-existing function. Or, as here, define one ourselves, which gives a nice way to build-in our specification for the collapse argument.

my_fun <- function(x) paste(x, collapse = " | ")
map(aliases, my_fun)
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#> 
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#> 
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"

Anonymous function, conventional

Define an anonymous function on-the-fly, in the conventional way. Here we put our desired value for the collapse argument into the function defintion itself.

map(aliases, function(x) paste(x, collapse = " | ")) 
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#> 
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#> 
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"

Alternatively you can simply name the function and provide collapse via ....

map(aliases, paste, collapse = " | ")
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#> 
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#> 
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"

Anonymous function, formula

We saved possibly the best for last.

purrr provides a very concise way to define an anonymous function: as a formula. This should start with the ~ symbol and then look like a typical top-level expression, as you might write in a script. Use .x to refer to the input, i.e. an individual element of the primary vector or list.

map(aliases, ~ paste(.x, collapse = " | "))
#> $`Theon Greyjoy`
#> [1] "Prince of Fools | Theon Turncloak | Reek | Theon Kinslayer"
#> 
#> $`Asha Greyjoy`
#> [1] "Esgred | The Kraken's Daughter"
#> 
#> $`Brienne of Tarth`
#> [1] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"

Workflow advice

It’s rare to write these calls perfect and whole the first time. You should probably pilot your idea on a single element. Then drop your proven, working logic into one of the above templates. When things aren’t working as expected, consider: have you tried to skip too many steps? Pull out an example, get everything to work there, check it on another example, then scale back up again.

A development process for the above might look like this:

(a <- map(got_chars, "aliases")[[19]]) ## OOPS! NULL --> a useless example
#> NULL
(a <- map(got_chars, "aliases")[[16]]) ## ok good
#> [1] "Bran"            "Bran the Broken" "The Winged Wolf"
paste(a, sep = " | ")                  ## OOPS! not what I want
#> [1] "Bran"            "Bran the Broken" "The Winged Wolf"
paste(a, collapse = " | ")             ## ok good
#> [1] "Bran | Bran the Broken | The Winged Wolf"
got_chars[15:17] %>%                   ## I am a programming god
  map("aliases") %>% 
  map_chr(paste, collapse = " | ")
#> [1] "Varamyr Sixskins | Haggon | Lump"                         
#> [2] "Bran | Bran the Broken | The Winged Wolf"                 
#> [3] "The Maid of Tarth | Brienne the Beauty | Brienne the Blue"

List to data frame

Since we’ve simplifed the aliases to a single string for each character, we can hold them as an atomic character vector instead of as list. Wouldn’t it be nice to put that in a data frame, with another variable holding the names? The enframe() function from tibble takes a named vector and promotes the names to a proper variable.

From the top, using four characters to conserve space:

aliases <- set_names(map(got_chars, "aliases"), map_chr(got_chars, "name"))
map_chr(aliases[c(3, 10, 20, 24)], ~ paste(.x, collapse = " | ")) %>% 
  tibble::enframe(value = "aliases")
#> # A tibble: 4 x 2
#>   name            aliases                                                 
#>   <chr>           <chr>                                                   
#> 1 Victarion Grey… The Iron Captain                                        
#> 2 Davos Seaworth  Onion Knight | Davos Shorthand | Ser Onions | Onion Lor…
#> 3 Eddard Stark    Ned | The Ned | The Quiet Wolf                          
#> 4 Aeron Greyjoy   The Damphair | Aeron Damphair

Alternative way to get same data frame

tibble::tibble(
  name = map_chr(got_chars, "name"),
  aliases = got_chars %>% 
    map("aliases") %>% 
    map_chr(~ paste(.x, collapse = " | "))
) %>% 
  dplyr::slice(c(3, 10, 20, 24))
#> # A tibble: 4 x 2
#>   name            aliases                                                 
#>   <chr>           <chr>                                                   
#> 1 Victarion Grey… The Iron Captain                                        
#> 2 Davos Seaworth  Onion Knight | Davos Shorthand | Ser Onions | Onion Lor…
#> 3 Eddard Stark    Ned | The Ned | The Quiet Wolf                          
#> 4 Aeron Greyjoy   The Damphair | Aeron Damphair

This is a very typical workflow: take an unwieldy nested list and, via extraction and/or simplification, produce a more approachable data frame.

Recap

These are the different ways to specify the function .f in the map()-type functions in purrr.

map(aliases, function(x) paste(x, collapse = "|")) 
map(aliases, paste, collapse = "|")
map(aliases, ~ paste(.x, collapse = " | "))

Exercises

Each character can be allied with one of the houses (or with several or with zero). These allegiances are held as a vector in each character’s component.

  1. Create a list allegiances that holds the characters’ house affiliations.
  2. Create a character vector nms that holds the characters’ names.
  3. Apply the names in nms to the allegiances list via set_names.
  4. Re-use the code from above to collapse each character’s vector of allegiances down to a string.
  5. We said that any elements passed via ... would be used “as is”. Specifically they are not used in a vectorized fashion. What happens if you pass collapse = c(" | ", " * ")? Why is that?

Parallel map

map2()

What if you need to map a function over two vectors or lists in parallel?

You can use map2() for that. Here is the usage:

map2(.x, .y, .f, ...)
map(INPUT_ONE, INPUT_TWO, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)

map2() has all the type-specific friends you would expect: map2_chr(), map2_lgl(), etc.

How will we specify the function to apply? All the usual options are open.

What shall our example be? Each character has a free text field, giving the date and possibly location of his or her birth. Let’s paste that together with the character’s name to get a sentence.

First, obtain the two inputs.

nms <- got_chars %>% 
  map_chr("name")
birth <- got_chars %>% 
  map_chr("born")

Now map over both with an existing function, defined by us.

my_fun <- function(x, y) paste(x, "was born", y)
map2_chr(nms, birth, my_fun) %>% head()
#> [1] "Theon Greyjoy was born In 278 AC or 279 AC, at Pyke"    
#> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock"  
#> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke"
#> [4] "Will was born "                                         
#> [5] "Areo Hotah was born In 257 AC or before, at Norvos"     
#> [6] "Chett was born At Hag's Mire"

Anonymous function, conventional form.

map2_chr(nms, birth, function(x, y) paste(x, "was born", y)) %>% head()
#> [1] "Theon Greyjoy was born In 278 AC or 279 AC, at Pyke"    
#> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock"  
#> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke"
#> [4] "Will was born "                                         
#> [5] "Areo Hotah was born In 257 AC or before, at Norvos"     
#> [6] "Chett was born At Hag's Mire"

Anonymous function via formula. Use .x and .y to refer to the individual elements of the two primary inputs.

map2_chr(nms[16:18], birth[16:18], ~ paste(.x, "was born", .y)) %>% tail()
#> [1] "Brandon Stark was born In 290 AC, at Winterfell"
#> [2] "Brienne of Tarth was born In 280 AC"            
#> [3] "Catelyn Stark was born In 264 AC, at Riverrun"

pmap()

What if you need to map a function over two or more vectors or lists in parallel?

You can use pmap() for that. Here is the usage:

pmap(.l, .f, ...)
map(LIST_OF_INPUT_LISTS, FUNCTION_TO_APPLY, OPTIONAL_OTHER_STUFF)

words

df <- got_chars %>% {
  tibble::tibble(
    name = map_chr(., "name"),
    aliases = map(., "aliases"),
    allegiances = map(., "allegiances")
  )
}
my_fun <- function(name, aliases, allegiances) {
  paste(name, "has", length(aliases), "aliases and",
        length(allegiances), "allegiances")
}
df %>% 
  pmap_chr(my_fun) %>% 
  tail()
#> [1] "Kevan Lannister has 1 aliases and 1 allegiances"
#> [2] "Melisandre has 5 aliases and 0 allegiances"     
#> [3] "Merrett Frey has 1 aliases and 1 allegiances"   
#> [4] "Quentyn Martell has 4 aliases and 1 allegiances"
#> [5] "Samwell Tarly has 7 aliases and 1 allegiances"  
#> [6] "Sansa Stark has 3 aliases and 2 allegiances"

Creative Commons License