2014 speakers

Michael Berthold (Zürich, Switzerland)

Michael Berthold, since August 2003, holds the Nycomed-Chair for Bioinformatics and Information Mining at Konstanz University, Germany where his research focuses on using machine learning methods for the interactive analysis of large information repositories in the Life Sciences. Most of the research results are made available to the public via the open source data mining platform KNIME (wikipedia entry). M. Berthold is Past President of the North American Fuzzy Information Processing Society, Associate Editor of several journals and the President of the IEEE System, Man, and Cybernetics Society. He has been involved in the organization of various conferences, most notably the IDA-series of symposia on Intelligent Data Analysis and the conference series on Computational Life Science. Together with David Hand he co-edited the textbook Intelligent Data Analysis: An Introduction which has recently appeared in a completely revised, second edition. He is also co-author of Guide to Intelligent Data Analysis (Springer Verlag) which appeared in summer 2010.
Linkedin
In addition to hosting a hands-on workshops on KNIME, Michael will be holding office hours, and participating in a data mining birds of a feather meet.

Gary Dusbabek (San Antonio)

Gary Dusbabek is an Apache Cassandra committer and PMC member, as well as a life-long programmer specializing in distributed systems. His past experience includes working with large-scale text and image indexes in the newspaper industry and high-volume advertisement booking software. Recent work at Rackspace includes working on Cassandra full-time and being a founding member of the Cloud Monitoring team. Gary current works on the Rackspace Service Registry..
Linkedin
One Man Clapping
Twitter: @gdusbabek

Russell Jurney (SF Bay)

Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. Russell is author of the fresh-off the press O'Reilly book Agile Data Science. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.
Linkedin
Data Syndrome
Twitter: @rjurney

Chris Johnson (NYC)

Chris Johnson is a machine learning engineer at Spotify where he hacks on music data, builds the best music recommendation system on the planet, and feeds multiple terabytes of data to Hadoop every day. Chris's toolchest includes Python, Numpy, Scikit-Learn, Hadoop, Hive, Java, Cassandra, and Storm.
As both a researcher and an engineer Chris is interested in problems of high dimension and efficient methods of scaling learning under the presence of massive data sets. He is particularly interested in the scalability, design, and architecture decisions that arise within real-time recommender systems such as music recommendation. His research has been featured at premier Machine Learning conferences including NIPS and AISTATS.
In his free time Chris is an avid rock climber, photographer, and music lover who enjoys traveling across the world to remote climbing destinations, experiencing music from a multitude of cultures, and snapping a corpus of photos along the way.
Chris holds MS and BS degrees from UT Austin.
Linkedin
UT CS Webpage
Twitter: @MrChrisJohnson

Steve Kramer (Austin)

Steve Kramer is the President and Chief Scientist of Paragon Science, a company he founded with the goal of developing cutting-edge technologies to aid in the counter-terrorism efforts of the United States. He has since expanded Paragon Science's scope to focus on providing valuable business intelligence in the commercial data-mining industry.
In 2005, Dr. Kramer started his current research in graph theory, network analysis, and complex systems theory, yielding Paragon's patent-pending dynamic anomaly detection technologies. He has performed data-mining consulting work for multiple clients, including The Advisory Board, Digital Motorworks, RetailMeNot, and Vast.com. He presented his paper "Anomaly detection in extremist web forums using a dynamical systems approach" at the 2010 ACM SIGKDD Workshop on Intelligence and Security Informatics (ISI-KDD 2010) and at the Pentagon. He also recently served as a program committee member and paper reviewer for IEEE International Conferences on Intelligence and Security Informatics 2011, IEEE International Conferences on Intelligence and Security Informatics 2012, ACM SIGKDD Workshop on Intelligence and Security Informatics 2012, IEEE Intelligence and Security Informatics 2013, and FOSINT-SI 2013 (International Symposium on Foundations of Open Source Intelligence and Security Informatics)..
Linkedin

Eric Lubow (NYC)

Eric Lubow is CTO of SimpleReach, builds highly-scalable distributed systems for processing social data. He began his career building secure Linux systems. Since then he has worked on building and administering various types of ad systems, maintaining and deploying large scale web applications, and building email delivery and analytics systems. Eric is also a DataStax MVP for Apache Cassandra, and co-author of the upcoming book Practical Cassandra, a Developer's Approach. In his spare time, Eric is a skydiver. BASE jumper, motorcycle rider. Team Tiger Schulmann mixed martial artist, snowboarder, New York Giants & 30 Rock fan, and dog dad.
Linkedin
eric.lubow.org
Twitter: @elubow

Charity Majors (San Francisco)

Charity Majors is an systems geek and scalability nerd at the red-hot startup, Parse, recently acquired by Facebook. Before coming to Parse she built systems for companies like Linden Lab, Shopkick and Cloudmark, and developed a passion for resilient and self-healing architecture. She reads a lot of economics and drinks a lot of whiskey. Charity went to school on a piano performance scholarship and studied philosophy, classical studies and music composition before dropping out to come play in the dotcom games.

Charity helps manage what is likely one of the biggest and most challenging mongo deployments in the world, with dynamic indexing and other fun goodies. However, Parse is not a MongoDB-only shop. You'll also find mysql, Cassandra, Redis, and Hive in the mix. Charity will share with DataDay her insight on why each tool was chosen as well as what her team learned along the way. This is a tremendous case study. Check out her recent talk at Chef Conf 2013 and her interview on The Cube.
GitHub
Linkedin
Twitter: @mipsytipsy

Brad Martin (Austin)

Brad Martin is GM & Principal Designer at Paia Corporation, currently developing equipment for data security. He is an enthusiastic and omnivorous technologist, having recently completed projects in ASIC cryptography, high-power AC/DC load control, capacitive touch sense and motor control systems. Following his early work in the development of service procedures for Apple Computer, Brad moved to Cupertino in 1978. He later received the BSEE from UC San Diego before moving to Austin where he has performed DSP and MCU design both directly and as Managing Partner of a large systems and circuit development company there. He is a registered professional engineer in the State of Texas.
Linkedin

Paco Nathan (Mountain View)

Paco Nathan, Chief Scientist for Mesosphere, is known as a “player/coach” data scientist who's led innovative Data teams building large-scale apps for 10+ years. As a recognized expert in distributed systems, machine learning, predictive modeling, cloud computing, Enterprise data workflows, and Open Data, Paco is an O'Reilly author and a developer evangelist for the Mesos and Cascading open source projects. Paco received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 25+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
Wikipedia entry
Linkedin
Twitter: @pacoid
For those of you interested in a deep dive into machine learning, Paco Nathan will be teaching his Hands-on Introduction to Machine Learning on Friday, January 10. This is a full day course.

Regunathan Radhakrishnan (SF Bay)

Regunathan Radhakrishnan received his M.S. and Ph.D degrees in EE from Polytechnic University, Brooklyn, NY in 2002 and 2004 respectively. He was a research fellow in the ECE department and also an intern at Mitsubishi Electric Research Labs, Cambridge, MA during his graduate studies. He was with Mitsubishi Electric Research Laboratories (MERL) until May 2006 as a Visiting Researcher. He was one of the leading contributors to MERL's Video Summarization algorithm based on Audio Classification which is now a differentiating feature of Mitsubishi Electric's DVD recorder in Japan. He joined the sound technology research team at Dolby Laboratories in June 2006 and applied machine learning methods to audio and video data for intelligent metadata creation for eco-system wide solutions. His research interests include statistical machine learning, video summarization, multimedia content identification, watermarking, and spatial audio. He has published several conference papers, as well as 7 journal papers and 5 book chapters and a book on multimedia content analysis and security. He has filed about 40 patents in the areas of multimedia content analysis, multimedia security and content identification. He is currently serving as an associate editor for Journal of Multimedia and as a Program Committee member of SPIE Forensics and Media Security conference. He has received “ The Valuable Invention Award” from Mitsubishi Electric Corporation for the work on Video summarization using Audio Analysis and has received the SMPTE Journal paper award for the work on Audio-Video Synchronization.
Linkedin

Srivatsan Ramanujam (SF Bay)

Srivatsan Ramanujam is a Senior Data Scientist at Pivotal where he executes Data Sciences labs for their customers, with a special focus on Text Analytics. Previously, as a Data Scientist at Sony Mobile Communications in Redwood City, he led Sony Mobile's Data Science initiatives that spanned across Statistical Machine Learning and Natural Language Processing. Before joining Sony, he was an engineer in the Analytics team at Salesforce.com. He received a Masters in Computer Sciences from UT Austin, completing his thesis and research in NLP where he focused on graphical models for weakly supervised sequence prediction problems. He loves mountaineering and is a native speaker of Python.
Linkedin

Sam Ritchie (SF Bay)

Sam Ritchie is a died-in-the-wool functional programmer, and until recently was a senior software engineer in Twitter’s Infrastructure engineering group and. Sam is co-author of a number of open-source Scala and Clojure libraries, including, Bijection, Algebird, Storehaus and Cascalog 2.0. Sam holds a bachelor’s degree in mechanical and aerospace engineering.
Sam is also co-author of Summingbird - a platform for streaming map/reduce used at Twitter to build aggregations in real-time or on Hadoop. When the programmer describes her job, that job can be run without change on Storm or Hadoop. Additionally, Summingbird can manage merging realtime/online computations with offline batches so that small errors in real-time do not accumulate.
Sam's current project is a startup called PaddleGuru - PaddleGuru handles everything painful about managing athletic events, from registration, payments, timing, to instant results. As a former sprint kayak racer, this is a "back to roots" project for Sam.
Github
Linkedin
@sritchie

Matthew Russell (Nashville)

Matthew Russell is Chief Technology Officer at Digital Reasoning, Principal at Zaffra, and author of several books on technology including Mining the Social Web (O'Reilly, 2013), now in its second edition. He is passionate about open source software development, data mining, and creating technology to amplify human intelligence. Matthew studied computer science and jumped out of airplanes at the United States Air Force Academy. When not solving hard problems, he enjoys practicing Bikram Hot Yoga, CrossFitting and participating in triathlons..
Linkedin
Twitter: @ptwobrussell
Matthew Russell will host a joint use case (Got Chaos?) with Dr. Steve Kramer of Paragon Science.
Matthew will also host a workshop on data mining.

Eric Sammer (SF Bay)

Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He's been involved in the open source community and has contributed to a large number of projects over the last decade. Eric is author of Hadoop Operations, published by O'Reilly.
Linkedin
Blog
Twitter: @esammer

Joe Stein (NYC)

Joe Stein is an Apache Kafka committer and PMC member. A frequent speaker on both Hadoop and Cassandra, Joe is the Founder and Principal Architect of Big Data Open Source Security LLC a professional services and product solutions company. Joe has been a distributed systems developer and architect for over 12 years now having built backend systems that supported over one hundred million unique devices a day processing trillions of events. He blogs and hosts a podcast about Hadoop and related systems at All Things Hadoop.
Linkedin
@allthingshadoop

Eric Tschetter (San Francisco/ ATX)

Eric Tschetter, is the lead architect of Druid, Metamarkets’ distributed, in-memory database. He held senior engineering positions at Ning and LinkedIn before joining Metamarkets. At LinkedIn, Eric productized LinkedIn’s PYMK with Hadoop. He holds bachelors degrees in Computer Science and Japanese from the University of Texas at Austin, and a M.S. from the University of Tokyo in Computer Science.
Linkedin
Twitter: @zedruid

Josh Wills (SF Bay)

Josh Wills is Cloudera’s Senior Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. He is the founder and VP of the Apache Crunch project for creating optimized MapReduce pipelines in Java and lead developer of Cloudera ML, a set of open-source libraries and command-line tools for building machine learning models on Hadoop. Prior to joining Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+.
Linkedin
Twitter: @josh_wills

Want to propose a talk or workshop? We have a few slots left. Send a note to submissions@geekaustin.org. (Please note that we do not accept proposals sent via recruiters, HR staff, marketing or PR Pros.)

Data Day Texas 2013: Wes McKinney his latest book, Python for Data Analysis, at the O'Reilly book exhibit.