Solving Iteration Problems with purrr

根据purrr-slides进行整理

另外9个由purrr包的开发者British Columbia大学统计学教授兼Rstudio首席科学家Hadley Wickham团队的软件工程师Jenny Bryan撰写: 2, 3, 4, 5, 6, 7, 8, 9, 10


In [4]:
library(tidyverse)
library(repurrrsive)   # devtools::install_github("jennybc/repurrrsive") # or install.packages("repurrrsive")
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1     ✔ readr   1.1.1
✔ tibble  1.4.1     ✔ dplyr   0.7.4
✔ tidyr   0.7.2     ✔ stringr 1.2.0
✔ ggplot2 2.2.1     ✔ forcats 0.2.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

For Each Do you are already solving them: copy & paste, for loops, (l/s)apply()

Alternative purrr::map() & friends

https://github.com/tidyverse/purrr

In [5]:
library(purrr)
In [120]:
str(sw_people[1:3])
List of 3
 $ Luke Skywalker:List of 16
  ..$ name      : chr "Luke Skywalker"
  ..$ height    : chr "172"
  ..$ mass      : chr "77"
  ..$ hair_color: chr "blond"
  ..$ skin_color: chr "fair"
  ..$ eye_color : chr "blue"
  ..$ birth_year: chr "19BBY"
  ..$ gender    : chr "male"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:5] "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" "http://swapi.co/api/films/2/" "http://swapi.co/api/films/1/" ...
  ..$ species   : chr "http://swapi.co/api/species/1/"
  ..$ vehicles  : chr [1:2] "http://swapi.co/api/vehicles/14/" "http://swapi.co/api/vehicles/30/"
  ..$ starships : chr [1:2] "http://swapi.co/api/starships/12/" "http://swapi.co/api/starships/22/"
  ..$ created   : chr "2014-12-09T13:50:51.644000Z"
  ..$ edited    : chr "2014-12-20T21:17:56.891000Z"
  ..$ url       : chr "http://swapi.co/api/people/1/"
 $ C-3PO         :List of 14
  ..$ name      : chr "C-3PO"
  ..$ height    : chr "167"
  ..$ mass      : chr "75"
  ..$ hair_color: chr "n/a"
  ..$ skin_color: chr "gold"
  ..$ eye_color : chr "yellow"
  ..$ birth_year: chr "112BBY"
  ..$ gender    : chr "n/a"
  ..$ homeworld : chr "http://swapi.co/api/planets/1/"
  ..$ films     : chr [1:6] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
  ..$ species   : chr "http://swapi.co/api/species/2/"
  ..$ created   : chr "2014-12-10T15:10:51.357000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.309000Z"
  ..$ url       : chr "http://swapi.co/api/people/2/"
 $ R2-D2         :List of 14
  ..$ name      : chr "R2-D2"
  ..$ height    : chr "96"
  ..$ mass      : chr "32"
  ..$ hair_color: chr "n/a"
  ..$ skin_color: chr "white, blue"
  ..$ eye_color : chr "red"
  ..$ birth_year: chr "33BBY"
  ..$ gender    : chr "n/a"
  ..$ homeworld : chr "http://swapi.co/api/planets/8/"
  ..$ films     : chr [1:7] "http://swapi.co/api/films/5/" "http://swapi.co/api/films/4/" "http://swapi.co/api/films/6/" "http://swapi.co/api/films/3/" ...
  ..$ species   : chr "http://swapi.co/api/species/2/"
  ..$ created   : chr "2014-12-10T15:11:50.376000Z"
  ..$ edited    : chr "2014-12-20T21:17:50.311000Z"
  ..$ url       : chr "http://swapi.co/api/people/3/"

map()

map(.x, .f, ...)

for each element of .x do .f

  • .x: a vector, a list, a data frame(for each column)
  • .f: We'll get to that...

How many starships has each character been in?

for each person in sw_people, count the number of starships

map(swpeople, _ )

Strategy

  1. Do it for one element

  2. Turn it into a recipe

  3. Use map() to do it for all elements

In [7]:
luke <- sw_people[[1]]
In [8]:
# How many starships has luke been in?
length(luke$starships)
2
In [9]:
leica <- sw_people[[5]]
length(leica)
15

Turn it into a recipe

In [122]:
map(sw_people[1:5], ~ length(.x$starships))
$`Luke Skywalker`
2
$`C-3PO`
0
$`R2-D2`
0
$`Darth Vader`
1
$`Leia Organa`
0
In [123]:
planet_lookup <- map_chr(sw_planets[1:5],"name") %>%
    set_names(map_chr(sw_planets[1:5],"url"))
planet_lookup
http://swapi.co/api/planets/2/
'Alderaan'
http://swapi.co/api/planets/3/
'Yavin IV'
http://swapi.co/api/planets/4/
'Hoth'
http://swapi.co/api/planets/5/
'Dagobah'
http://swapi.co/api/planets/6/
'Bespin'

Find the name of each characters home world

find the body mass index(BMI) of all characters

bmi = (mass in kg)/(height in m)^2)

In [14]:
luke$homeworld
'http://swapi.co/api/planets/1/'
In [15]:
planet_lookup[luke$homeworld]
http://swapi.co/api/planets/1/: 'Tatooine'
In [124]:
map(sw_people[1:5], ~planet_lookup[.x$homeworld])
$`Luke Skywalker`
NA: NA
$`C-3PO`
NA: NA
$`R2-D2`
NA: NA
$`Darth Vader`
NA: NA
$`Leia Organa`
http://swapi.co/api/planets/2/: 'Alderaan'

ROADmap()

map_lgl(.x, .f, ...)

Other types of output

Other ways of specifying .f

Other iteration functions

map2(.x, .y, .f, ...)

Other types of output

Other ways of specifying .f

Other iteration functions

map() details

map() always returns a list

simple output:

map_lgl(): logical vector

map_int(): integer vector

map_dbl(): double vector

map_chr(): character vector

walk: when you want nothing at all, use a function for its side effects

Result: No surprises!

vector same length as .x or an ERROR

In [23]:
# names can be useful
sw_people <- sw_people %>% set_names(map_chr(sw_people, "name"))

Repalce map() with the appropriately typed function

In [54]:
# How many starships has each character been in?
map(sw_people[1:5], ~length(.x[["starships"]]))
$`Luke Skywalker`
2
$`C-3PO`
0
$`R2-D2`
0
$`Darth Vader`
1
$`Leia Organa`
0
In [53]:
map_int(sw_people[1:5], ~ length(.x[["starships"]]))
Luke Skywalker
2
C-3PO
0
R2-D2
0
Darth Vader
1
Leia Organa
0
In [55]:
# what is color is each character's hair?
map(sw_people[1:5], ~ .x[["hair_color"]])
$`Luke Skywalker`
'blond'
$`C-3PO`
'n/a'
$`R2-D2`
'n/a'
$`Darth Vader`
'none'
$`Leia Organa`
'brown'
In [56]:
map_chr(sw_people[1:5], ~ .x[["hair_color"]])
Luke Skywalker
'blond'
C-3PO
'n/a'
R2-D2
'n/a'
Darth Vader
'none'
Leia Organa
'brown'
In [57]:
# Is the character male?
map(sw_people[1:5], ~ .x[["gender"]] == "male")
$`Luke Skywalker`
TRUE
$`C-3PO`
FALSE
$`R2-D2`
FALSE
$`Darth Vader`
TRUE
$`Leia Organa`
FALSE
In [58]:
# How heavy is each character?
map(sw_people[1:5], ~ .x[["mass"]])
$`Luke Skywalker`
'77'
$`C-3PO`
'75'
$`R2-D2`
'32'
$`Darth Vader`
'136'
$`Leia Organa`
'49'
In [59]:
map_lgl(sw_people[1:5], ~ .x[["gender"]] == "male")
Luke Skywalker
TRUE
C-3PO
FALSE
R2-D2
FALSE
Darth Vader
TRUE
Leia Organa
FALSE
In [60]:
# How heavy is each character?
map_dbl(sw_people, ~ .x[["mass"]])
Error: Can't coerce element 1 from a character to a double
Traceback:

1. map_dbl(sw_people, ~.x[["mass"]])

Doesn't work... because we get a string back

In [61]:
map(sw_people[1:5], ~ .x[["mass"]])
$`Luke Skywalker`
'77'
$`C-3PO`
'75'
$`R2-D2`
'32'
$`Darth Vader`
'136'
$`Leia Organa`
'49'
In [62]:
# A little risky
map_dbl(sw_people[1:5], ~ as.numeric(.x[["mass"]]))
Luke Skywalker
77
C-3PO
75
R2-D2
32
Darth Vader
136
Leia Organa
49
In [64]:
# Probably want something like:
map_chr(sw_people[1:5], ~ .x[["mass"]]) %>% 
    readr::parse_number(na = "unknown")
  1. 77
  2. 75
  3. 32
  4. 136
  5. 49

(1) .f can be a string or integer

for each element, sxtract the named/numbered element

map(.x, .f = "some_name")

equivalent to

map(.x, ~ .x[["some_name"]]

(2) .f can be a function

map(.x, .f = some_function, ...)

equivalent to

map(.x, ~ some_function(.x, ...))

In [68]:
char_starships <- map(sw_people[1:5], "starships")
map_int(char_starships, length)
Luke Skywalker
2
C-3PO
0
R2-D2
0
Darth Vader
1
Leia Organa
0
In [70]:
# In one go
map(sw_people[1:5], "starships") %>% map_int(length)
Luke Skywalker
2
C-3PO
0
R2-D2
0
Darth Vader
1
Leia Organa
0
In [72]:
# also equivalent to
map_int(sw_people[1:5], ~ length(.x[["starships"]]))
Luke Skywalker
2
C-3PO
0
R2-D2
0
Darth Vader
1
Leia Organa
0

From earlier...

create planet_lookup:

In [77]:
planet_lookup <- map_chr(sw_planets[1:5], "name") %>% 
    set_names(map_chr(sw_planets[1:5], "url"))
In [78]:
planet_lookup
http://swapi.co/api/planets/2/
'Alderaan'
http://swapi.co/api/planets/3/
'Yavin IV'
http://swapi.co/api/planets/4/
'Hoth'
http://swapi.co/api/planets/5/
'Dagobah'
http://swapi.co/api/planets/6/
'Bespin'

What about sapply() & lapply()?

What type of object does sapply() return? It depends.

Motivation for purrr:

  • consistent return type,
  • useful shortcuts,
  • consistent syntax for more complicated iteration

Star Wars challenges

In [83]:
# Which film (see sw_films) has the most characters?
map(sw_films[1:5], "characters") %>%
    map_int(length) %>%
    set_names(map_chr(sw_films[1:5], "title")) %>%
    sort()
A New Hope
18
Return of the Jedi
20
The Phantom Menace
34
Revenge of the Sith
34
Attack of the Clones
40
In [84]:
# Which species has the most possible eye colors?
sw_species[[1]]$eye_colors
'yellow, red'
In [87]:
map_chr(sw_species[1:5], "eye_colors") %>% 
    strsplit(", ") %>% 
    map_int(length)
# this is lazy, what about n/a and unkown?
  1. 2
  2. 3
  3. 2
  4. 1
  5. 2

More iteration functions

to each element of .x apply .f

map(.x, .f )

to each element of .x apply .f

walk(.x, .f )

Expect nothing in return

You actually get .x invisibly back, good for piping

to each element of .x and corresponding element of .y apply .f

map2(.x, .y, .f)

Discuss

1.

For each function, which two arguments might be useful to iterate over?

download.file()

rnorm()

lm()

predict.lm()

write.csv()

2.

For which functions above should we use walk2() or a typed version of map2()?

download.file() for each url download to destfile walk2(), map2_int()

rnorm() for each n generate a Normal sample with mean mean (or sd) (See purrr::rerun() for repeating a function many times)

lm() for each data fit a model(formula)

predict.lm() for each model(object), generate predictions at data(newdata)

readr::write_csv() for each data frame(x) save to path. Similar for ggplot::ggsave() for each plot save to filename walk2()

What does this code do?

jan_sales <- read_csv("jan.csv")
jan_sales <- mutate(jan_sales, month = "jan")
feb_sales <- read_csv("feb.csv")
feb_sales <- mutate(feb_sales, month = "feb")
mar_sales <- read_csv("mar.csv")
mar_sales <- mutate(mar_sales, month = "mar")
sales <- bind_rows(jan_sales, feb_sales, mar_sales)

Reduce duplication (and mistakes) with purrr

months <- c("jan", "feb", "mar")

files <- paste0(months, ".csv")

sales_list <- map(files, read_csv)

Now... For each element(do) add a month column

Use the same strategy!

Do it for one

Solve the problem for one element

mutate(sales_list[[1]], month = months[[1]])
mutate(sales_list[[2]], month = months[[2]])

Iterating over two objects!

Do it for all!

months <- c("jan", "feb", "mar")
files <- paste0(months, ".csv")
sales_list <- map(files, read_csv)
sales_list_months <- map2(.x = sales_list,
                          .y = months,
                          .f = ~mutate(.x, month = .y)
bind_rows(sales_list_months)
In [91]:
library(repurrrsive)
gap_split_small <- gap_split[1:10]
countries <- names(gap_split_small)

For each country create a ggplot of life expectancy through time with a title

In [94]:
ggplot(gap_split[[1]], aes(year, lifeExp)) + 
    geom_line() + 
    labs(title = countries[[1]])
In [95]:
# For all countres
plots <- map2(gap_split_small, countries,
             ~ ggplot(.x, aes(year, lifeExp)) + 
                 geom_line() + 
                 labs(title = .y))
In [96]:
plots[[1]]
In [97]:
# Display all plot
walk(plots[1:3], print)

purrr and list columns

purrr and list columns

Data should be in a data frame as soon as it makes sense!

Data frame: cases in rows, variable in columns

purrr can help lists into tibbles

In [103]:
library(tidyverse)
people_tbl <- tibble(
    name = sw_people %>% map_chr("name"),
    films = sw_people %>% map("films"), #will result in list column
    height = sw_people %>% map_chr("height") %>% 
    readr::parse_number(na = "unknown"), #needs some parsing
    species = sw_people %>% map_chr("species", .null = NA_character_)
) #isn't in every element

head(people_tbl,3)
namefilmsheightspecies
Luke Skywalker http://swapi.co/api/films/6/, http://swapi.co/api/films/3/, http://swapi.co/api/films/2/, http://swapi.co/api/films/1/, http://swapi.co/api/films/7/172 http://swapi.co/api/species/1/
C-3PO http://swapi.co/api/films/5/, http://swapi.co/api/films/4/, http://swapi.co/api/films/6/, http://swapi.co/api/films/3/, http://swapi.co/api/films/2/, http://swapi.co/api/films/1/167 http://swapi.co/api/species/2/
R2-D2 http://swapi.co/api/films/5/, http://swapi.co/api/films/4/, http://swapi.co/api/films/6/, http://swapi.co/api/films/3/, http://swapi.co/api/films/2/, http://swapi.co/api/films/1/, http://swapi.co/api/films/7/ 96 http://swapi.co/api/species/2/

combine purrr with dplyr to work with list columns

In [113]:
# A useful lookup table -----------------------------------------------
film_number_lookup <- map_chr(sw_films, "url") %>%
  map(~ stringr::str_split_fixed(.x, "/", 7)[, 6])  %>%
  as.numeric() %>%
  set_names(map_chr(sw_films, "url"))
people_tbl <- tibble(
  name    = sw_people %>% map_chr("name"),
  films   = sw_people %>% map("films"),
  height  = sw_people %>% map_chr("height") %>%
    readr::parse_number(na = "unknown"),
  species = sw_people %>% map_chr("species", .null = NA_character_)
)

# Turning parts of our list to a tibble ---------------------------------
# people_tbl$films

# Use map with mutate to manipulate list columns
people_tbl <- people_tbl %>%
  mutate(
    film_numbers = map(films,
      ~ film_number_lookup[.x]),
    n_films = map_int(films, length)
  )
people_tbl[1:5,] %>% select(name, film_numbers, n_films)
namefilm_numbersn_films
Luke Skywalker6, 3, 2, 1, 7 5
C-3PO 5, 4, 6, 3, 2, 16
R2-D2 5, 4, 6, 3, 2, 1, 77
Darth Vader 6, 3, 2, 1 4
Leia Organa 6, 3, 2, 1, 7 5

Create a new character column that collapses the film numbers into a single string, e.g. for Luke:"6,3,2,1,7"

?paste

In [117]:
people_tbl <- people_tbl %>% 
    mutate(film_squashed = map_chr(film_numbers, paste, collapse = ","))

people_tbl[1:5,] %>% select(name, n_films, film_squashed)
namen_filmsfilm_squashed
Luke Skywalker5 6,3,2,1,7
C-3PO 6 5,4,6,3,2,1
R2-D2 7 5,4,6,3,2,1,7
Darth Vader 4 6,3,2,1
Leia Organa 5 6,3,2,1,7

Lists and functions

Key objects in purrr

purrr provides a pile of functions to make working with them easier

Functions: safely(), possibly(), partial()

Lists: transpose(),accumulate(),reduce(),every(),order_by()

Wrap up

purrr provides:

  • functions that write for loops for you
  • with consistent syntax, and
  • convenient shortcuts for specifying functions to iterate

Choosing the right function depends on:

  • type of iteration
  • type of output