Who will speak at Data Day Texas 2018?

Get $100 off the regular room rate at the official conference hotel. Use the following link to book your room: http://datadaytexas.com/2018/book-a-hotel-room

Below are the first half of the speakers to be announced for Data Day Texas 2018. Is there someone you want to see? Some tool, technology, or project you want to see covered? Send us your thoughts at suggestions@datadaytexas.com. If you would like to propose a session, visit our proposals page.

Keynote Lukas Biewald (SF Bay) @l2k

Lukas Biewald (Wikipedia / LinkedIn / GitHub) is the Founder & Chief Data Scientist of CrowdFlower. Founded in 2007, CrowdFlower provides Labor-on-Demand to help companies outsource high-volume, repetitive tasks to a massively-distributed global workforce.
Before founding CrowdFlower, Lukas was a senior scientist and manager within the Ranking and Management Team at Powerset, Inc., acquired by Microsoft in 2008. He led the Search Relevance Team for Yahoo! Japan after graduating from Stanford University with a B.S. in Mathematics and an M.S. in Computer Science. Recently, Lukas won the Netexplorateur Award for GiveWork – a collaboration with Samasource that brings digital work to refugees worldwide. Lukas is also an expert level Go player.
Check out Lukas' recent interview with Ben Lorica for the O'Reilly Data Show

John Akred (SF Bay) @BigDataAnalysis

John Akred is the Founder and CTO of Silicon Valley Data Science. In the business world, John Akred likes to help organizations become more data driven. He has over 15 years of experience in machine learning, predictive modeling, and analytical system architecture. His focus is on the intersection of data science tools and techniques; data transport, processing and storage technologies; and the data management strategy and practices that can unlock data driven capabilities for an organization. A frequent speaker at the O'Reilly Strata Conferences, John is host of the perennially popular workshop: Building A Data Platform.
John will be giving the following presentation: Machine Learning: From The Lab To The Factory

Mara Averick (Boston) @dataandme

Mara Averick (LinkedIn / GitHub / Medium ) is a polymath and self-confessed data nerd. With a strong background in research, she has a breadth of experience in data analysis, visualization, and applications thereof. Currently, by day, she’s a Consultant at TCB Analytics. By night, you’ll find her sharing dope R related stuff on Twitter and translating heavily technical subject matter into easy reading for a non-technical audience. When she’s not talking data, she's diving into NBA stats, exploring weird and wonderful words, and/or indulging in her obsession with all things Archer. (Thanks to Mango Solutions for bio.)
Mara will be speaking as part of R User Day.

Dave Bechberger (Houston)

Dave Bechberger is a Sr. Architect at Gene by Gene, a genetic genealogy and bioinformatics company, where he works extensively on developing their next-generation data architecture. Dave has spent his career engaging in full stack software development but specializes in building data architectures in complex data domains such as bioinformatics, oil and gas, supply chain management, etc. He uses his knowledge of graph and other big data technologies to build out highly performant and scalable systems. Dave has previously spoken at a variety of international technical conferences including NDC Oslo, NDC London, and Graph DayTexas.

Ryan Boyd (SF Bay)

Ryan Boyd (Linkedin) is a SF-based software engineer focused on helping developers understand the power of graph databases. Previously he was a product manager for architectural software, built applications and web hosting environments for higher education, and worked in developer relations for twenty products during his 8 years at Google. He enjoys cycling, sailing, skydiving, and many other adventures when not in front of his computer.
Ryan will be giving the following Graph Day Seattle presentation: Combining graph analytics with real-time graph query workloads for solving business problems.

Ben Bromhead (SF Bay)

Ben Bromhead (Linkedin) is Co-founder and CTO at Instaclustr, where he sets the technical direction for the company. Ben is well known as an active of the Apache Cassandra community. Prior to Instaclustr, Ben had been working as an independent consultant developing NoSQL solutions for enterprises. He ran a high-tech cryptographic and cyber security formal testing laboratory at BAE Systems and Stratsec.
Ben will be speaking in the Cassandra track at Data Day Texas.

Jeff Carpenter (Scottsdale, Arizona) @jscarp

Jeff Carpenter (Linkedin) is a technology evangelist at DataStax, where he leverages his background in system architecture, microservices and Apache Cassandra to help empower developers and operations engineers build distributed systems that are scalable, reliable, and secure. Jeff has worked on projects ranging from a complex battle planning system in an austere network environment, to a cloud-based hotel reservation system and is the author of Cassandra: The Definitive Guide, 2nd Edition.
Jeff will be speaking in the Cassandra track at Data Day Texas.

Mike Downie (Bryan/College Station)

Mike Downie, Technical Lead at Expero, is driven to solve technical problems in the most practical way possible. To do this well he developed a diverse set of skills to truly understand the problem(s) at hand and a depth of technical knowledge to create great solutions.
Lifelong interests in technology, math, and science and that drive to solve problems led Mike to pursue a Computer Science degree at Texas A&M. He wasn’t content to learn about software in a classroom, he wanted to apply concepts in real-world settings as soon as possible. His final three years of college he worked professionally developing software. First as a co-op student for a (then) national telecommunications company and later part-time for a software consulting company. This ‘dual education’ allowed Mike to graduate with a strong academic foundation and the practical skills to apply his knowledge.
After completing his degree Mike continued to work for the consulting company, helping the company grow from 4 employees to more than 60 over the course of a decade. He gained expertise through success on a diverse set of projects. Project work included data acquisition and control, database applications and user interface design. During this period Mike also built software teams, studying software project management, estimation, requirements gathering, application lifecycle management and learning to effectively mentor junior developers.
Mike’s most recent work includes database design and front-end development for enterprise resource planning systems. He continues to look for difficult problems needing great solutions. What makes a great solution? He says first and foremost it has to work well, meeting both the functional and non-functional requirements. A solution can’t just work it has to work fast enough, be intuitive to the user and be maintainable by the developers to be great.

Jasmine Dumas (Connecticut ) @jasdumas

Jasmine Dumas (LinkedIn / GitHub) is a Data Scientist at Simple Finance where she is focused on experimentation and data product development. She earned a B.S.E. in Biomedical Engineering from the University of Hartford and has experienece in Aerospace Manufacturing, Medical Devices and Financial Technology. She is an active member of the R programming community and has developed open source packages: shinyGEO, ttbbeer, shinyLP, & gramr and participated in Google Summer of Code, NASA Datanauts, R-Ladies, and Forwards. She is currently developing a course on shiny with DataCamp and co-organizing the regional Noreast'R Conference.
Jasmine will be speaking as part of R User Day.

Joey Echeverria (SF Bay) @fwiffo

Joey Echeverria is the platform technical lead at Splunk, where he builds applications for scaling IT operations built on the Apache Hadoop platform. Joey is a committer on the Kite SDK, an Apache-licensed data API for the Hadoop ecosystem. Joey was previously a software engineer at Cloudera, where contributed to several ASF projects including Apache Flume, Apache Sqoop, Apache Hadoop, and Apache HBase. Joey is also a coauthor of Hadoop Security, published by O'Reilly Media.

Jonathan Ellis (Austin) @spyced

Jonathan Ellis is CTO and co-founder at DataStax. Prior to DataStax, Jonathan worked extensively with Apache Cassandra while employed at Rackspace. Prior to Rackspace, Jonathan built a multi-petabyte, scalable storage system based on Reed-Solomon encoding for backup provider Mozy.
Jonathan will be speaking in the Cassandra track at Data Day Texas.

Alex Engler (Washington, D.C.) @alexcengler

Alex Engler (LinkedIn / GitHub / Urban Institute / Georgetown / Johns Hopkins) is the Program Director and Lecturer for the M.S. in Computational Analysis and Public Policy program at the University of Chicago. He is also a contributing data scientist to the Urban Institute, where he worked before UChicago. Alex also previously taught visualization and data science for policy analysis at Georgetown University and Johns Hopkins University.
Alex will be presenting the following workshop: Introduction to SparkR in AWS EMR, as part of R User Day.

Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell

In August 2017, Dr. Denise Gosnell, transitioned into a Solutions Architect position with DataStax where she aspires to build upon her experiences as a data scientist and graph architect to further their established line of graph solutions. Prior to her role with DataStax, Dr. Gosnell was a Data Scientist and Technology Evangelist at PokitDok. During her three years with PokitDok, she built software solutions for and spoke at over a dozen conferences on permissioned blockchains, machine learning applications of graph analytics, and data science within the healthcare industry.
Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research on how our online interactions leave behind unique identifiers that form a “social fingerprint” led to presentations at major conferences from San Diego to London and drew the interest of such tech industry giants as Microsoft Research and Apple. Additionally, she was a leader in addressing the underrepresentation of women in her field and founded a branch of Sheryl Sandberg’s Lean In Circles.

Jon Haddad (Los Angeles) @rustyrazorblade

Jon Haddad (Linkedin) is Principal Consultant at The Last Pickle, where he helps customers of all size succeed with Apache Cassandra. An Apache Cassandra PMC Member, Jon co-wrote cqlengine, the Python object mapper for CQL.
has 15 years experience in both development and operations. For the last 11 years, he's worked at various startups in southern California, including two years as an evangelist at DataStax.

Chester Ismay (Portland) @old_man_chester

Chester Ismay (LinkedIn / GitHub) is Curriculum Lead at DataCamp. He was formerly an Adjunct Professor of Sociology at Pacific University and an Instructional Technologist and Consultant for Data Science, Statistics, and R at Reed College. He obtained his PhD in statistics from Arizona State University and has taught courses and led workshops in statistics, data science, mathematics, computer science, and sociology. He is the co-author of the fivethirtyeight R data package and is the author of the thesisdown R package. He is also a co-author of an open source textbook entitled ModernDive: An Introduction to Statistical and Data Sciences via R.
Chester will be speaking as part of R User Day.

Holden Karau (San Francisco) @holdenkarau

Holden Karau is a transgender Canadian, Apache Spark committer, an active open source contributor, and co-author of Learning Spark & High Performance Spark. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden talks internationally on Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of computers she enjoys dancing & playing with fire.

Jason Kessler (Seattle) @jasonkessler

Jason Kessler (LinkedIn) is a lead data scientist at CDK Global, where he analyzes language use and consumer behavior in the online auto-shopping ecosystem. Prior to joining CDK, Jason was the founding data scientist at PlaceIQ and worked as a research scientist for JD Power and Associates. He has published peer-reviewed papers on algorithms and corpora for sentiment and belief analysis and has sat on program committees and reviewed for several AI and NLP conferences. Most recently, he has conducted research on identifying persuasive and influential language and the visualization of differing corpora.
Jason will be giving the following presentation: Lexicon Mining for Semiotic Squares: Exploding Binary Classification

Albert Y. Kim (Amherst) @rudeboybert

Albert Y. Kim (LinkedIn / GitHub) is a Lecturer in Statistics in the Mathematics & Statistics Department at Amherst College. Born in Montreal Quebec, he earned his BSc in Mathematics and Computer Science from McGill University in 2004 and his PhD in Statistics from the University of Washington in 2011. Prior to joining Amherst College, he was a Decision Support Engineering Analyst in the AdWords division of Google Inc, a Visiting Assistant Professor of Statistics at Reed College, and an Assistant Professor of Statistics at Middlebury College.
Albert will be speaking as part of R User Day.

Gunnar Kleemann (Berkeley / Austin)

Gunnar Kleemann is a Data Scientist with the Berkeley Data Science Group (BDSG). He is interested in how data science facilitates biological discovery and lowers the barrier to high-throughput research, particularly in small, independent labs. In addition to his work with BDSG, he is also involved in the development and implementation of technologies like the ATX Hackerspace Biology Laboratory.
Gunnar holds a PhD in Molecular Genetics from Albert Einstein College of Medicine and a Master’s in Data Science from UC Berkeley. He did post-doctoral research on the genomics of aging at Princeton University, where his research focused developing high throughput robotic assays to understand how genetic changes alter lifespan and reproductive biology.

Steve Kramer (Austin) @ParagonSci_Inc

Steve Kramer (LinkedIn) is the President and Chief Scientist of Paragon Science, a company he founded with the goal of developing cutting-edge technologies to aid in the counter-terrorism efforts of the United States. He has since expanded Paragon Science's scope to focus on providing valuable business intelligence in the commercial data-mining industry.
In 2005, Dr. Kramer started his current research in graph theory, network analysis, and complex systems theory, yielding Paragon's patent-pending dynamic anomaly detection technologies. He has performed data-mining consulting work for multiple clients, including The Advisory Board, Digital Motorworks, RetailMeNot, and Vast.com. He presented his paper "Anomaly detection in extremist web forums using a dynamical systems approach" at the 2010 ACM SIGKDD Workshop on Intelligence and Security Informatics (ISI-KDD 2010) and at the Pentagon. He also recently served as a program committee member and paper reviewer for IEEE International Conferences on Intelligence and Security Informatics 2011, IEEE International Conferences on Intelligence and Security Informatics 2012, ACM SIGKDD Workshop on Intelligence and Security Informatics 2012, IEEE Intelligence and Security Informatics 2013, and FOSINT-SI 2013 (International Symposium on Foundations of Open Source Intelligence and Security Informatics)..

Jared Lander (NYC) @jaredlander

Jared Lander (LinkedIn) is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
Jared specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.
Jared will be speaking as part of R User Day.

Dor Laor (Sunnyville, CA) @dorlaor

Dor Laor is the CEO of ScyllaDB, the company behind the open source Cassandra-compatible database of the same name. Previously, Dor was part of the founding team of the KVM hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM and Xen development for several years. Dor holds an MSc from the Technion and a Phd in snowboarding.

Victor Lee (Kent, Ohio)

Dr. Victor Lee is Senior Product Manager at TigerGraph, bringing together a strong academic background, decades of experience in the technology sector, and a strong commitment to quality and serving customer needs. His first stint in Silicon Valley was as an IC circuit designer and technology transfer manager, before returning to school for his computer science PhD, focusing on graph data mining. He received his BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University, and PhD in Computer Science from Kent State University. Before joining TigerGraph, Victor was a visiting professor at John Carroll University.
Dr. Lee will be giving the following Graph Day presentation: Real-time deep link analytics: The next stage of graph analytics

William Lyon (SFBay) @lyonwj

William Lyon is a software developer at Neo4j, the open source graph database. As an engineer on the Developer Relations team, he works primarily on integrating Neo4j with other technologies, building demo apps, helping other developers build applications with Neo4j, and writing documentation. Prior to joining Neo, William worked as a software developer for several startups in the real estate software, quantitative finance, and predictive API fields. William holds a Masters degree in Computer Science from the University of Montana. You can find him online at lyonwj.com.

Rob McDaniel (Seattle)

Rob McDaniel is the founder of Lingistic, the machine learning team behind howbiased.com, which has a focus on NLP problems related to politics, debate analysis and the detection of bias in news media. HowBiased.com hopes to help humans learn to be more critical of the material they ingest, by identifying traits and cues in the language which may be hidden or non-obvious.
Rob has a diverse background in engineering and machine learning, both with major corporations and startups. He has worked on problems related to machine translation, taxonomy classification and information extraction, and has a passion for unsupervised methods and graph theory. When not working on his startup, Rob is also Manager of Applied Science at Rakuten, where he manages AI that expands the depth and quality of Rakuten's global product catalog.

Patrick McFadin (SF Bay) @patrickmcfadin

Patrick McFadin (Linkedin), VP of Developer Relations at DataStax, is regarded as one of the foremost experts of Apache Cassandra and data modeling techniques. While at DataStax, he has helped build some of the largest deployments in the world. Previous to DataStax, Patrick was Chief Architect at Hobsons, an education services company. There, he spoke often on web application design and performance.

Lucy McGowan (Nashville) @LucyStats

Lucy D'Agostino McGowan (LinkedIn / GitHub) is a Biostatistics PhD candidate at Vanderbilt University where her research focuses on observational studies, large-scale inference, and methods for quantifying and estimating the effect of unmeasured confounding. She is the co-founder of R-Ladies Nashville and is enthusiastic about learning from and uplifting other women in the R and STEM communities.
Lucy will be speaking as part of R User Day.

Jessica Minnier (Portland) @datapointier

Jessica Minnier (LinkedIn / GitHub)
is an Assistant Professor of Biostatistics at Oregon Health & Sciences University. She is a faculty member of the OHSU-PSU School of Public Health with appointments in the Knight Cardiovascular Institute and Knight Cancer Institute Biostatistics Shared Resource. Her statistical research interests include risk prediction with high dimensional data sets and the analysis of genetic and other omics data. She is also interested in statistical computing (mostly in R), reproducible research and open science.
Jessica teaches Mathematics/Statistics II, a statistical inference course for the MS in Biostatistics program at OHSU-PSU School of Public Health. Jessica has an A.M. and Ph.D. in Biostatistics from Harvard University and a B.A. in Mathematics with minor in Computer Science from Lewis & Clark College.
Jessica will be speaking as part of R User Day.

Qazaleh Mirsharif (San Francisco)

Qazaleh Mirsharif (Linkedin / Google Scholar) is a Machine Learning Scientist, specializing in Computer Vision, at CrowdFlower. Bio forthcoming.
Qazaleh will be speaking as part of AI Weekend.

Jonathan Mugan (Austin) @jmugan

Jonathan Mugan (Linkedin) is a researcher specializing in artificial intelligence, machine learning, and natural language processing. His current research focuses in the area of deep learning for natural language generation and understanding. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin. His thesis was centered in developmental robotics, which is an area of research that seeks to understand how robots can learn about the world in the same way that human children do. Dr. Mugan also held a post-doctoral position at Carnegie Mellon University, where he worked at the intersection of machine learning and human-computer interaction. One of the most requested speakers at the Data Day Texas conferences, he recently also spoke on the topic of NLP at the O’Reilly AI conference, and is the creator of the O’Reilly video course Natural Language Text Processing with Python. Dr. Mugan is also the author of The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion
Jonathan will be giving the following presentation:
Generating Natural-Language Text with Neural Networks

Jonathan Nolis (Seattle) @skyetetra

Jonathan Nolis (LinkedIn / GitHub) is the Director of Insights & Analytics at Lenati, and is the lead of the Customer Insights & Analytics team. He has over a decade of experience in solving business problems using data science. Jonathan has provided insights and strategic advice in industries such as retail, manufacturing, aerospace, health care, and e-commerce. Jonathan helps create proprietary technology for Lenati including the Loyalty Program ROI Simulator – a tool that uses big data to predict the value of a loyalty program. He has a PhD in industrial engineering, and has several academic publications in the field of applied optimization. Prior to joining Lenati, Jonathan was a Lead of Advanced Analytics at Promontory Financial Group, a regulatory compliance consulting firm.
Jonathan will be speaking as part of R User Day.

Hilary Parker (San Francisco) @hspter

Hilary Parker (LinkedIn / GitHub) is a Data Scientist at Stitch Fix and co-host of the Not So Standard Deviations podcast. She is an R and statistics enthusiast determined to bring rigor to analysis wherever she goes. At Stitch Fix she works on teasing apart correlation from causation, with a strong dose of reproducibility. Formerly a Senior Data Analyst at Etsy, she received a PhD in Biostatistics from the Johns Hopkins Bloomberg School of Public Health.
Julia will be speaking as part of R User Day.

Josh Perryman (Bryan / College Station) @joshperryman

Josh Perryman likes to play with data. Oftentimes this is implementing proprietary algorithms closer to the data for performance or scale. Sometimes it is ad-hoc investigation and analysis, a sort of exploratory querying. A few times he’s been able to leverage his experience with data engines for dramatic performance improvements. But the real joy is designing a schema for both functionality and performance, one which increases the productivity of other developers and enables a technology to solve new problems or deliver new value to the business.
But technology isn't just data, and he does more than just play with data. He’s worked with high performance computing (HPC) environments, taking computations from hours to minutes or seconds. He has built visualizations which deliver new insights into complex data domains. He’s managed technology personnel, both directly and indirectly, to deliver technology solutions. He’s have put together more types of technology components, software and hardware, than can be counted, because one of his fortes is solving problems by building sustainable systems.

Alan Pita (Austin)

Alan Pita, Graph Developer and Architect at Expero, has 20+ years of experience as a developer and architect for high-performance software-hardware systems scaling to multiple data centers with thousands of participating nodes each. He specializes in helping firms productize and monetize complex software technology from emerging research. He has three patents stemming from 10 years of work at IBM’s Server Division. He has led globally distributed technical teams, mined and managed agile product requirements, and built a proven track record of delivering boundary-defying technical innovations. Alan brings extensive technical experience in the areas of programming languages, complex SoC design, computer system architecture and functional design verification. Alan has a Bachelor’s Degree in Computer Science and Engineering from Texas A&M University and a Master's Degree in Computer Science from Stanford University. He is currently on hiatus from the Ph.D. program in Electrical and Computer Engineering at the University of Texas at Austin. He is also a graduate of the IBM Leadership Excellence course.

Aaron Ploetz (Minneapolis) @APloetz

Aaron Ploetz (Linkedin) has a professional software developer since 1997, and has been named a DataStax MVP for Apache Cassandra three times (2014-17). While not at work, he has been a computer hobbyist since 1987 (when his Mother first brought home a Tandy 1000 EX). He still works on a variety of projects in his home lab, including (but not limited to) building Linux servers, gaming machines, and test Cassandra clusters. Aaron received a Bachelor of Science degree in Management/Computer Systems from the University of Wisconsin - Whitewater in 1998, and a Master of Science degree in Software Engineering (emphasis on Database Technologies) from Regis University in 2013. He and his wife Coriene live with their three children in the Twin Cities. When not in front of a computer he enjoys amateur astronomy, writing, and coaching his sons' baseball and ice hockey teams.

Gabriela de Queiroz (San Francisco) @gdequeiroz

Gabriela de Queiroz (LinkedIn / GitHub) is the Lead Data Scientist at SelfScore. Formerly Gabriela was data scientist at Sharethrough, where she developed statistical models from concept creation to production, designed, ran, and analyzed experiments, and employed a variety of techniques to derive insights and drive data-centric decisions. Gabriela is the founder of R-Ladies, an organization created to promote diversity in the R community, which now has over 25 chapters worldwide. Currently, she is developing an online course on machine learning in partnership with DataCamp.
Gabriela will be speaking as part of R User Day.

Karthik Ramasamy (San Francisco) @karthikz

Karthik Ramasamy (LinkedIn) is the co-founder of Streamlio - a company that focuses on building next generation real time infrastructure. Before Streamlio, Karthik was the engineering manager and technical lead for real-time infrastructure at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. He co-founded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL, that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum, where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks, where he designed and delivered platforms, protocols, databases, and high availability solutions for network routers that are widely deployed on the internet. Before joining Juniper, at the University of Wisconsin he worked extensively in parallel database systems, query processing, scale out technologies, storage engines, and online analytical systems. Several of these research projects were later spun off as a company acquired by Teradata. Karthik is the author of several publications, patents, and Network Routing: Algorithms, Protocols and Architectures. He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases.

Andrew Ray (Bentonville, Arkansas)

Andrew Ray is a Senior Technical Expert at Sam’s Club Technology. He is passionate about big data and has extensive experience working with Apache Spark and Hadoop. Andrew is an active contributor to the Apache Spark project including SparkSQL and GraphX. At Walmart Andrew built an analytics platform on Hadoop that integrated data from multiple retail channels using fuzzy matching and distributed graph algorithms. Andrew also led the adoption of Spark at Walmart from proof-of-concept to production. Andrew earned his Ph.D. in Mathematics from the University of Nebraska, where he worked on extremal graph theory.
Andrew will be giving the following Graph Day presentation: Writing Distributed Graph Algorithms

David Robinson (NYC) @drob

David Robinson (LinkedIn / GitHub) is a data scientist at Stack Overflow with a PhD in Quantitative and Computational Biology from Princeton University. He enjoys developing open source R packages, including broom, gganimate, fuzzyjoin and widyr, as well as blogging about statistics, R, and text mining on his blog, Variance Explained.
David will be giving the following R User Day presentation: We R What We Ask: The Landscape of R Users on Stack Overflow.

Emily Robinson (NYC) @robinson_es

Emily Robinson (LinkedIn / GitHub) works as a Data Analyst at Etsy with the search team to design, implement, and analyze experiments on the ranking algorithm, UI changes, and new features. Emily earned her masters in Organizational Behavior from INSEAD in 2016 and her bachelor’s in Decision Sciences from Rice University (where she took classes from Hadley Wickham). She's a co-organizer of the R-Ladies NYC chapter, a global organization to promote gender diversity in the R community. She enjoys blogging about A/B Testing, conferences, and data science projects on her blog, Hooked on Data.
Emily will be speaking as part of R User Day.

Juan Sequeda (Austin) @juansequeda

Dr. Juan Sequeda is the co-founder of Capsenta and the developer of Ultrawrap, a system that virtualizes relational databases as graph data sources. His research interests are on the intersection of Logic and Data and in particular between the Semantic Web and Relational Databases for data integration. Juan holds a Ph.D. in Computer Science from the University of Texas at Austin. Capsenta is a spin-off from his PhD research. Juan is the recipient of the NSF Graduate Research Fellowship, Best Student Paper at the 2014 International Semantic Web Conference, and 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org. Juan is on the editorial board of the Journal of Web Semantics and has been an invited expert member and standards editor for the World Wide Web Consortium (W3C) Relational Database to RDF Graph working group.
Check out our recent interview with Juan Sequeda.

Julia Silge (Salt Lake City) @juliasilge

Julia Silge (LinkedIn / GitHub) is a data scientist at Stack Overflow. She enjoys making beautiful charts, the statistical programming language R, black coffee, red wine, and the mountains of her adopted home here in Utah. She has a PhD in astrophysics and an abiding love for Jane Austen. Her work involves analyzing and modeling complex data sets while communicating about technical topics with diverse audiences.
Julia will be speaking as part of R User Day.

David Smith (Chicago) @revodavid

David Smith is the R Community Lead at Microsoft. With a background in data science, he writes daily about applications of predictive analytics at the Revolutions blog (blog.revolutionanalytics.com), and is a co-author of Introduction to R.
David will be speaking as part of R User Day.

Nick Strayer (Nashville) @NicholasStrayer

Nick Strayer (LinkedIn / GitHub) has worked in many different realms, including as a Journalist at the New York Times, data scientist at Dealer.com in Vermont, and as a "data artist in residence" at tech startup Conduce in California. Currently, he is a PhD student in biostatistics at Vanderbilt University and also an intern at the Johns Hopkins Data Science Lab. Recently (May '15), he graduated from the University of Vermont where he majored in mathematics and statistics and minored in computer science.
Nick likes data. Manipulating it, modeling it, making it (simulation), visualizing it and yes, even cleaning it. He does these things with some combination of R, Python and Javascript (d3.js in particular). Most recently he has been fascinated with conveying complex statistical topics and methods using intuitive and interactive graphics.
Nick's current research interests include: data gathering, extracting inference from machine learning, data visualization and scientific communication. When not in "school mode" Nick loves to bike places, read science fiction and wander around gardens/musuems.
Nick will be speaking as part of R User Day.

Denis Vrdoljak (SF Bay)

Denis Vrdoljak (Co-Founder and Managing Director at the Berkeley Data Science Group (BDSG)): Denis is a Berkeley trained Data Scientist and a Certified ScrumMaster (CSM), with a background in Project Management. He has experience working with a variety of data types-- from intelligence analysis to electronics QA to business analytics. In Data Science, his passion and current focus is in Machine Learning based Predictive Analytics and Network Graph Analysis. He holds a Master's in Data Science from the UC Berkeley and a Master's in International Affairs from Texas A&M.

Claudius Weinberger (Köln, Germany) @weinberger

Claudius Weinberger is the CEO and Co-founder of ArangoDB GmbH - the company behind identically named NoSQL multi-model database. Claudius has been a serial entrepreneur for the majority of his life. Together with his co-founder, he has been busy building databases for more than 20 years. He started with in-memory to mostly memory databases, moved to K/V stores, multi-dimensional cubes and ultimately graph databases. Throughout the years he focused mostly on product and project management, further sharpening his vision of the database market. He has co-founded ArangoDB in 2012. Claudius studied economics with business informatics as key aspect at the University of Cologne. He spends all his free time with his two little daughters, is a judo enthusiast and occasionally enjoys gardening.

Ted Wilmes @trwilmes

Ted Wilmes is passionate about learning complex systems top to bottom and he enjoys applying this knowledge to help customers with their data architecture and performance tuning needs. Over the past few years he has been involved in the rapidly growing graph database space and is an active committer and PMC member on the Apache TinkerPop project.

Daniel Woodie (Austin) @DanielWoodie5

Daniel Woodie is founder and lead scientist of Bamboo Analytics, a data science services firm. He's trained originally as a statistician and has worked on applications ranging from systems neuroscience to global supply chains. With Bamboo Analytics he offers analytical consulting and training to early stage startups and Fortune 500 companies, alike.
Daniel will be emcee for R User Day at Data Day Texas.



Emil Eifrem of Neo4j describing the evolution of the property graph model.

Rob McDaniel (Seattle) of LiveStories was one of the highest rated speakers at Data Day Texas 2017