Episode

Scraping data with rvest and purrr

with Max Humber

useR!2017: Scraping data with rvest and purrr

Keywords: rvest, purrr, webscraping, fantasy, sports
Webpages: http://www.maxhumber.com
Really interesting data never actually lives inside of a tidy csv. Unless, of course, you think Iris or mtcars is super interesting. Interesting data lives outside of comma separators. It's unstructured, and messy, and all over the place. It lives around us and on poorly formatted websites, just waiting and begging to be played with.
Finding and fetching and cleaning your own data is a bit like cooking a meal from scratch—instead of microwaving a frozen TV dinner. Microwaving food is simple. It's literally one step: put thing in microwave. There is, however, no singular step to making a proper meal from scratch. Every meal is different. The recipe for making coconut curry isn't the same as the recipe for Brussels sprout tacos. But both require a knife and a frying pan!
In "Scraping data with rvest and purrr" I will talk through how to pair and combine rvest (the knife) and purrr (the frying pan) to scrape interesting data from a bunch of websites. This talk is inspired by a recent blog post that I authored for and was well received by the r-bloggers.com community.
rvest is a popular R package that makes it easy to scrape data from html web pages.
purrr is a relatively new package that makes it easy to write code for a single element of a list that can be quickly generalized to the rest of that same list.