Who will be speaking at Data Day 2015?

Data Day 2014 brought an incredible group of speakers in the data space to Austin. Among those speaking and leading workshops were Michael Berthold, Gary Dusbabek, Christopher Johnson, Russell Jurney, Steve Kramer, Eric Lubow, Charity Majors, Paco Nathan, Sam Richie, Matthew Russell, Eric Sammer, Joe Stein, Eric Tschetter, and Josh Wills.

We are currently in the process of selecting speakers for 2015. Below are some of the speakers who have already confirmed for talks and workshops. If you would like to speak at Data Day, the Instructions for submitting a proposal are on our proposals page.

Confirmed speakers for Data Day Texas 2015

Michael Berthold (Zürich, Switzerland)

One of the most requested speakers at last year's Data Day, Michael Berthold, will be returning for 2015 to share the latest news with KNIME. Since August 2003, Michael has held the Nycomed-Chair for Bioinformatics and Information Mining at Konstanz University, Germany where his research focuses on using machine learning methods for the interactive analysis of large information repositories in the Life Sciences. Most of the research results are made available to the public via the open source data mining platform KNIME (wikipedia entry). M. Berthold is Past President of the North American Fuzzy Information Processing Society, Associate Editor of several journals and the President of the IEEE System, Man, and Cybernetics Society. He has been involved in the organization of various conferences, most notably the IDA-series of symposia on Intelligent Data Analysis and the conference series on Computational Life Science. Together with David Hand he co-edited the textbook Intelligent Data Analysis: An Introduction which has recently appeared in a completely revised, second edition. He is also co-author of Guide to Intelligent Data Analysis (Springer Verlag) which appeared in summer 2010.
While in town, Michael will also be leading full-day beginning and advanced workshops in KNIME.

Oscar Boykin (San Francisco)

Oscar Boykin was trained as a mathematical physicist, is former University of Florida professor, currently works at Twitter as a staff data scientist at Twitter. Oscar is co-author of Algebird, Scalding and Summingbird.
Comfortable in Java, C++, C, C#, Python, Haskell, Scala and many other programming languages, Oscar has an extensive background developing both computational science codes as well as systems programming experience with large scale distributed systems. He also has a strong statistics, probability and information theory background which he applies to machine learning and data-mining problems.
Twitter: @posco

Julia Evans (Montreal)

Julia Evans lives in Montréal and works on Stripe's machine learning team. She is a Hacker School alumna, keeps a blog (http://jvns.ca/), and likes learning about operating system internals. When not programming, Julia spends a lot of time thinking about how to organize amazing conferences (like @bangbangcon), make scary concepts more accessible, and writing about surprising programming things she's learned recently. Julia is also co-organizer of @mtlgirlhackers.
Twitter: @b0rk

Chris Fregly (San Francisco)

Chris Fregly, author of the upcoming Manning Press publication, Spark in Action, is a 15 year software veteran focused mainly on global media and entertainment companies. Formerly, Chris was Netflix Streaming Platform Senior Engineer at Netflix where he worked with both Hadoop and Cassandra ecosystems. Chris started his career at BEA Systems where his team built the WebLogic Portal, Commerce, and Personalization products from scratch. Most recently, Chris built the Amazon Kinesis Streaming adapter for Spark.
Twitter: @cfregly

Chris Johnson (NYC)

Chris Johnson will be returning to Data Day this year to talk about Spotify's work with Apache Spark -- specifically Spark for music recommendation. Spotify was one of the first companies to deploy Spark large-scale in the enterprise.
Chris Johnson is a machine learning engineer at Spotify where he hacks on music data, builds the best music recommendation system on the planet, and feeds multiple terabytes of data to Hadoop every day. Chris's toolchest includes Python, Numpy, Scikit-Learn, Hadoop, Hive, Java, Cassandra, and Storm.
As both a researcher and an engineer Chris is interested in problems of high dimension and efficient methods of scaling learning under the presence of massive data sets. He is particularly interested in the scalability, design, and architecture decisions that arise within real-time recommender systems such as music recommendation. His research has been featured at premier Machine Learning conferences including NIPS and AISTATS.
In his free time Chris is an avid rock climber, photographer, and music lover who enjoys traveling across the world to remote climbing destinations, experiencing music from a multitude of cultures, and snapping a corpus of photos along the way.
Chris holds MS and BS degrees from UT Austin.
UT CS Webpage
Twitter: @MrChrisJohnson


Matthew Kirk (Seattle)

Matthew Kirk is a quant turned programmer who holds a B.S. in Economics and a B.S. in Applied and Computational Mathematical Sciences with a concentration in Quantitative Economics from the University of Washington. He started Modulus 7, a data science and Ruby development consulting firm, in early 2012. Matthew has spoken around the world about using machine learning and data science with Ruby.
Matthew is the author of the upcoming O'Reilly book, Thoughtful Machine Learning.
Twitter: @mjkirk

Nathan Marz (SF Bay)

Nathan Marz was the Lead Engineer at BackType before BackType was acquired by Twitter in July of 2011. At Twitter Nathan started the streaming compute team which provides infrastructure that supports many critical applications throughout the company. Nathan left Twitter in March of 2013 to start my own company (currently in stealth).
Nathan created the Storm and Cascalog projects and has many other projects on his GitHub page. His projects are relied upon by over 50 companies around the world, including Yahoo!, Twitter, Groupon, The Weather Channel, Taobao, and many others.
Nathan is working on a book called Big Data: principles and best practices of scalable realtime data systems for Manning Publications.
Twitter: @nathanmarz

Paco Nathan (Mountain View)

Paco Nathan has been a speaker and advisor to Data Day since 2012. Once again, Paco will be sharing the latest data news from around the globe.
Paco Nathan is a "player/coach" who has led innovative Data teams building large-scale apps for several years. Expertise in distributed systems, machine learning, cloud computing, functional programming. Paco is an O'Reilly author, and currently an open source evangelist for Apache Spark with Databricks and an advisor for Amplify Partners. Paco's current interests include Enterprise data workflows, math literacy among execs, and the intersection Ag+Data. He received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups. Most recently, Paco published his Just Enough Math video for O'Reilly Media.
Wikipedia entry
Twitter: @pacoid
While Paco is in town, he will be offering multiple full day training sessions through GeekAustin. Details to follow.

Matthew Russell (Nashville)

Matthew Russell was one of the highest rated speakers and Data Day 2014. Attendees asked to bring him back for 2015. This time, in addition to speaking, Matthew will be offering a full day workshop. Details on the workshop to be announced.
Matthew is Chief Technology Officer at Digital Reasoning, Principal at Zaffra, and author of several books on technology including Mining the Social Web (O'Reilly, 2013), now in its second edition. He is passionate about open source software development, data mining, and creating technology to amplify human intelligence. Matthew studied computer science and jumped out of airplanes at the United States Air Force Academy. When not solving hard problems, he enjoys practicing Bikram Hot Yoga, CrossFitting and participating in triathlons..
Twitter: @ptwobrussell

Eric Sammer (San Francisco)

Eric Sammer is currently CTO of ScalingData. Prior, Eric most recently served as an Engineering Manager at Cloudera, responsible for developer tools and partner integrations. Eric’s team worked with hundreds of partners to develop robust solutions and integrate them tightly with Cloudera’s Enterprise Data Hub. He was previously a Principal Solutions Architect, working with customers and strategic partners to support and integrate Hadoop clusters and related infrastructure. While working with some of Cloudera’s largest customers, Eric developed many of the best practices for developing large, distributed, data processing infrastructure.
Eric is a committer on the Apache Flume and MRUnit projects and the creator of the Kite open source project. Prior to Cloudera, Eric served as a Senior Engineer and Architect at several large scale data driven organizations including Experian and Conductor. Eric is the author of Hadoop Operations published by O’Reilly Media. He speaks frequently on technology and techniques for large scale data processing, integration, and system management.. Eric is author of Hadoop Operations, published by O'Reilly.
Twitter: @esammer

Dean Wampler (Chicago)

Dean Wampler, is a software developer, new data scientist, technical author, and frequent public speaker living in Chicago. Dean is the author of Functional Programming for Java Developers, the co-author (with Alex Payne) of Programming Scala, and the co-author (with Edward Capriolo and Jason Rutherglen) of Programming Hive, all published by O'Reilly Media. Dean is a frequent speaker at conferences and user groups. Many of his presentations can be found at his Polyglot Programming site. Dean also helps organize several conferences as well as started the Chicago-Area Scala Enthusiasts user group.
Website / Blog
Twitter: @deanwampler

Highlights from 2014

Above: Paco Nathan, author of Enterprise Data Workflows with Cascading, discussing the killing fields of calculus, from his upcoming video - Just Enough Math.

Above: Christopher Johnson, Machine Learning Engineer at Spotify, discussing algorithmic music discovery.

Above: Charity Majors, Production Engineering Manager at Parse/Facebook, discussing how she gets mongo, cassandra, mysql, redis, and hive to play together in a heterogenous environment.