R User Day at Data Day Texas

For several years, we've received a multitude of requests from the data community to increase the Data Day coverage of the R language and environment (wikipedia). This year, we decided to do something about it.

Imagine if Austin had a world class R User Conference.

Early this spring, our friend Daniel Woodie, took over as organizer of the Austin R User Group. Seeing the great job he is doing to revive the group and bring in good content, we asked him if he would help us curate an R User track for Data Day Texas. After reviewing our lists of potential speakers, we decided not to limit R to an individual track, but to go all the way -- and create a mini R conference within Data Day Texas. We will set aside an entire portion of the conference facility aside for the the R community. You will not need to buy a separate ticket for R User Day. Your Data Day Texas ticket gets you into all the content for R User Day as well.

Confirmed Talks for R User Day

Introduction to SparkR in AWS EMR (90 minute session)

Alex Engler - Urban Institute

This session is a hands-on tutorial on working in Spark through R and RStudio in AWS Elastic MapReduce (EMR). The demonstration will overview how to launch and access Spark clusters in EMR with R and RStudio installed. Participants will be able to launch their own clusters and run Spark code during an introduction to SparkR, including the SparklyR package, for data science applications. Theoretical concepts of Spark, such as the directed acyclic graph and lazy evaluation, as well as mathematical considerations of distributed methods will be interspersed throughout the training. Follow up materials on launching SparkR clusters and tutorials in SparkR will be provided.
Intended Audience: R users who are interested in a first foray into distributed cloud computing for the analysis of massive datasets. No big data, dev ops, or Spark experience is required.

Something old, something new, something borrowed, something blue: Ways to teach data science (and learn it too!)

Albert Y. Kim - Amherst College

How can we help newcomers take their first steps into the world of data science and statistics? In this talk, I present ModernDive: An Introduction to Statistical and Data Sciences via R, an open source, fully reproducible electronic textbook available at ModernDive.com, co-authored by myself and Chester Ismay, Data Science Curriculum Lead at DataCamp. ModernDive’s authoring follows a paradigm of “versions, not editions” much more in line with software development than traditional textbook publishing, as it is built using RStudio’s bookdown interface to R Markdown. In this talk, I will present details on our book’s construction, our approaches to teaching novices to use tidyverse tools for data science (in particular ggplot2 for data visualization and dplyr for data wrangling), how we leverage these data science tools to teach data modeling via regression, and preview the new infer package for statistical inference, which performs statistical inference using an expressive syntax that follows tidy design principles. We’ll conclude by presenting example vignettes and R Markdown analyses created by undergraduate students to demonstrate the great potential yielded by effectively empowering new data scientists with the right tools.

Confirmed Speakers for R User Day

Mara Averick (Boston) @dataandme

Mara Averick (LinkedIn / GitHub / Medium ) is a polymath and self-confessed data nerd. With a strong background in research, she has a breadth of experience in data analysis, visualization, and applications thereof. Currently, by day, she’s a Consultant at TCB Analytics. By night, you’ll find her sharing dope R related stuff on Twitter and translating heavily technical subject matter into easy reading for a non-technical audience. When she’s not talking data, she's diving into NBA stats, exploring weird and wonderful words, and/or indulging in her obsession with all things Archer. (Thanks to Mango Solutions for bio.)

Jasmine Dumas (Connecticut) @jasdumas

Jasmine Dumas (LinkedIn / GitHub) is a Data Scientist at Simple Finance where she is focused on experimentation and data product development. She earned a B.S.E. in Biomedical Engineering from the University of Hartford and has experienece in Aerospace Manufacturing, Medical Devices and Financial Technology. She is an active member of the R programming community and has developed open source packages: shinyGEO, ttbbeer, shinyLP, & gramr and participated in Google Summer of Code, NASA Datanauts, R-Ladies, and Forwards. She is currently developing a course on shiny with DataCamp and co-organizing the regional Noreast'R Conference.

Alex Engler (Washington, D.C.) @alexcengler

Alex Engler (LinkedIn / GitHub / Urban Institute / Georgetown / Johns Hopkins) is the Program Director and Lecturer for the M.S. in Computational Analysis and Public Policy program at the University of Chicago. He is also a contributing data scientist to the Urban Institute, where he worked before UChicago. Alex also previously taught visualization and data science for policy analysis at Georgetown University and Johns Hopkins University.
Alex will be presenting the following workshop: Introduction to SparkR in AWS EMR, as part of R User Day.

Chester Ismay (Portland) @old_man_chester

Chester Ismay (LinkedIn / GitHub) is Curriculum Lead at DataCamp. He was formerly an Adjunct Professor of Sociology at Pacific University and an Instructional Technologist and Consultant for Data Science, Statistics, and R at Reed College. He obtained his PhD in statistics from Arizona State University and has taught courses and led workshops in statistics, data science, mathematics, computer science, and sociology. He is the co-author of the fivethirtyeight R data package and is the author of the thesisdown R package. He is also a co-author of an open source textbook entitled ModernDive: An Introduction to Statistical and Data Sciences via R.

Albert Y. Kim (Amherst) @rudeboybert

Albert Y. Kim (LinkedIn / GitHub) is a Lecturer in Statistics in the Mathematics & Statistics Department at Amherst College. Born in Montreal Quebec, he earned his BSc in Mathematics and Computer Science from McGill University in 2004 and his PhD in Statistics from the University of Washington in 2011. Prior to joining Amherst College, he was a Decision Support Engineering Analyst in the AdWords division of Google Inc, a Visiting Assistant Professor of Statistics at Reed College, and an Assistant Professor of Statistics at Middlebury College.
Albert will be giving the following R User Day talk: Something old, something new, something borrowed, something blue: Ways to teach data science (and learn it too!).

Jared Lander (NYC) @jaredlander

Jared Lander (LinkedIn) is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
Jared specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Lucy McGowan (Nashville) @LucyStats

Lucy D'Agostino McGowan (LinkedIn / GitHub) is a Biostatistics PhD candidate at Vanderbilt University where her research focuses on observational studies, large-scale inference, and methods for quantifying and estimating the effect of unmeasured confounding. She is the co-founder of R-Ladies Nashville and is enthusiastic about learning from and uplifting other women in the R and STEM communities.

Jessica Minnier (Portland) @datapointier

Jessica Minnier (LinkedIn / GitHub)
is an Assistant Professor of Biostatistics at Oregon Health & Sciences University. She is a faculty member of the OHSU-PSU School of Public Health with appointments in the Knight Cardiovascular Institute and Knight Cancer Institute Biostatistics Shared Resource. Her statistical research interests include risk prediction with high dimensional data sets and the analysis of genetic and other omics data. She is also interested in statistical computing (mostly in R), reproducible research and open science.
Jessica teaches Mathematics/Statistics II, a statistical inference course for the MS in Biostatistics program at OHSU-PSU School of Public Health. Jessica has an A.M. and Ph.D. in Biostatistics from Harvard University and a B.A. in Mathematics with minor in Computer Science from Lewis & Clark College.

Jonathan Nolis (Seattle) @skyetetra

Jonathan Nolis (LinkedIn / GitHub) is the Director of Insights & Analytics at Lenati, and is the lead of the Customer Insights & Analytics team. He has over a decade of experience in solving business problems using data science. Jonathan has provided insights and strategic advice in industries such as retail, manufacturing, aerospace, health care, and e-commerce. Jonathan helps create proprietary technology for Lenati including the Loyalty Program ROI Simulator – a tool that uses big data to predict the value of a loyalty program. He has a PhD in industrial engineering, and has several academic publications in the field of applied optimization. Prior to joining Lenati, Jonathan was a Lead of Advanced Analytics at Promontory Financial Group, a regulatory compliance consulting firm.

Hilary Parker (San Francisco) @hspter

Hilary Parker (LinkedIn / GitHub) is a Data Scientist at Stitch Fix and co-host of the Not So Standard Deviations podcast. She is an R and statistics enthusiast determined to bring rigor to analysis wherever she goes. At Stitch Fix she works on teasing apart correlation from causation, with a strong dose of reproducibility. Formerly a Senior Data Analyst at Etsy, she received a PhD in Biostatistics from the Johns Hopkins Bloomberg School of Public Health.

Gabriela de Queiroz (San Francisco) @gdequeiroz

Gabriela de Queiroz (LinkedIn / GitHub) is the Lead Data Scientist at SelfScore. Formerly Gabriela was data scientist at Sharethrough, where she developed statistical models from concept creation to production, designed, ran, and analyzed experiments, and employed a variety of techniques to derive insights and drive data-centric decisions. Gabriela is the founder of R-Ladies, an organization created to promote diversity in the R community, which now has over 25 chapters worldwide. Currently, she is developing an online course on machine learning in partnership with DataCamp.

David Robinson (NYC) @drob

David Robinson (LinkedIn / GitHub) is a data scientist at Stack Overflow with a PhD in Quantitative and Computational Biology from Princeton University. He enjoys developing open source R packages, including broom, gganimate, fuzzyjoin and widyr, as well as blogging about statistics, R, and text mining on his blog, Variance Explained.

Emily Robinson (NYC) @robinson_es

Emily Robinson (LinkedIn / GitHub) works as a Data Analyst at Etsy with the search team to design, implement, and analyze experiments on the ranking algorithm, UI changes, and new features. Emily earned her masters in Organizational Behavior from INSEAD in 2016 and her bachelor’s in Decision Sciences from Rice University (where she took classes from Hadley Wickham). She's a co-organizer of the R-Ladies NYC chapter, a global organization to promote gender diversity in the R community. She enjoys blogging about A/B Testing, conferences, and data science projects on her blog, Hooked on Data.

Julia Silge (Salt Lake City) @juliasilge

Julia Silge (LinkedIn / GitHub) is a data scientist at Stack Overflow. She enjoys making beautiful charts, the statistical programming language R, black coffee, red wine, and the mountains of her adopted home here in Utah. She has a PhD in astrophysics and an abiding love for Jane Austen. Her work involves analyzing and modeling complex data sets while communicating about technical topics with diverse audiences.

Nick Strayer (Nashville) @NicholasStrayer

Nick Strayer (LinkedIn / GitHub) has worked in many different realms, including as a Journalist at the New York Times, data scientist at Dealer.com in Vermont, and as a "data artist in residence" at tech startup Conduce in California. Currently, he is a PhD student in biostatistics at Vanderbilt University and also an intern at the Johns Hopkins Data Science Lab. Recently (May '15), he graduated from the University of Vermont where he majored in mathematics and statistics and minored in computer science.
Nick likes data. Manipulating it, modeling it, making it (simulation), visualizing it and yes, even cleaning it. He does these things with some combination of R, Python and Javascript (d3.js in particular). Most recently he has been fascinated with conveying complex statistical topics and methods using intuitive and interactive graphics.
Nick's current research interests include: data gathering, extracting inference from machine learning, data visualization and scientific communication. When not in "school mode" Nick loves to bike places, read science fiction and wander around gardens/musuems.

David Smith (Chicago) @revodavid

David Smith is the R Community Lead at Microsoft. With a background in data science, he writes daily about applications of predictive analytics at the Revolutions blog (blog.revolutionanalytics.com), and is a co-author of Introduction to R.

Daniel Woodie (Austin) @DanielWoodie5

Daniel Woodie is founder and lead scientist of Bamboo Analytics, a data science services firm. He's trained originally as a statistician and has worked on applications ranging from systems neuroscience to global supply chains. With Bamboo Analytics he offers analytical consulting and training to early stage startups and Fortune 500 companies, alike.
Daniel will be emcee for R User Day at Data Day Texas.

Author Signings at R User Day

Text Mining with R (O'Reilly Media)

Text Mining with R, by Julia Silge and David Robinson, shows you how to manipulate, summarize, and visualize the characteristics of text, sentiment analysis, tf-idf, and topic modeling. Along with tidy data methods, you’ll also examine several beginning-to-end tidy text analyses on data sources from Twitter to NASA datasets. These analyses bring together multiple text mining approaches covered in the book.
Tackle a variety of tasks in natural language processing by learning how to use the R language and tidy data principles. This practical guide provides examples and resources to help you get up to speed with dplyr, broom, ggplot2, and other tidy tools from the R ecosystem. You’ll discover how tidy data principles can make text mining easier, more effective, and consistent by employing tools already in wide use.
Get real-world examples for implementing text mining using tidy R package
Understand natural language processing concepts like sentiment analysis, tf-idf, and topic modeling
Learn how to analyze unstructured, text-heavy data using R language and ecosystem.