Who will speak at Data Day Texas 2019?

Our discount hotel block sells out early. For the best selection, Book a room now.

We're now beginning to announce the first wave of speakers for Data Day Texas 2019. We'll be adding speakers every few days. Bookmark this page for updates. If you'd like to see the list of speakers who appeared at Data Day Texas 2018, you can view it here.

Opening Keynote
Gwen Shapira (Cupertino, CA ) @gwenshap

Gwen Shapira (LinkedIn) is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data-processing pipelines using Apache Kafka. Gwen is an Oracle Ace director, the co-author of two O'Reilly books: Kafka: the definitive guide and Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.
Gwen will be presenting the Opening Keynote: Lies Enterprise Architects Tell

Data Science Keynote
Sean Owen (Austin) @sean_r_owen

Sean Owen (Quora / LinkedIn) is the Data Science Lead at Databricks. Previously, Sean was director of data science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Hadoop. Sean is an Apache Spark committer, was a committer and VP for Apache Mahout, and is the co-author of Advanced Analytics on Spark and Mahout in Action. Previously, Sean was a senior engineer at Google.
Sean will present the following session: Obvious conclusions that are actually wrong.

NLP Keynote
Robert Munro (San Francisco ) @WWRob

Robert Munro (LinkedIn) most recetly was Chief Technology Officer at Figure-Eight (formerly known as Crowdflower). Previously, he ran Product for AWS's first Natural Language Processing services in the Deep Learning team at Amazon AI. Robert is an expert in combining Human and Machine Intelligence, working with Machine Learning approaches to Text, Speech, Image and Video Processing. Robert has founded several AI companies, building some of the top teams in Artificial Intelligence. He has worked in many diverse environments, from Sierra Leone, Haiti and the Amazon, to London, Sydney and Silicon Valley, in organizations ranging from startups to the United Nations. Robert has published more than 50 papers, has a PhD from Stanford University.

Graph Keynote
Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell

Dr. Denise Gosnell leads a team at DataStax which builds some of the largest, distributed graph applications in the world. Her passion centers on examining, applying, and evangelizing the applications of graph data and complex graph problems. As an NSF Fellow, Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research coined the concept of "social fingerprinting" by applying graph algorithms to predict user identity from social media interactions.​ ​Since then, Dr. Gosnell has built, published, patented, and spoke on dozens of topics related to graph theory, graph algorithms, graph databases, and applications of graph data across all industry verticals.
Dr. Gosnell will be presenting the Graph Summit Keynote: From Theory to Production.

Jans Aasman (SF Bay)

Jans Aasman (Wikipedia / LinkedIn) is a Ph.D. psychologist and expert in Cognitive Science - as well as CEO of Franz Inc., an early innovator in Artificial Intelligence and provider of the graph database, AllegroGraph. As both a scientist and CEO, Dr. Aasman continues to break ground in the areas of Artificial Intelligence and Knowledge Graphs as he works hand-in- hand with numerous Fortune 500 organizations as well as US and Foreign governments. Jans recently authored an IEEE article on “Enterprise Knowledge Graphs”.
Dr. Aasman spent a large part of his professional life in telecommunications research, specializing in applied Artificial Intelligence projects and intelligent user interfaces. He gathered patents in the areas of speech technology, multimodal user interaction, recommendation engines while developing precursor technology for tablets and personal assistants. He was also a professor in the Industrial Design department of the Technical University of Delft. Dr. Aasman is a noted conference speaker at such events as Smart Data, NoSQL Now, International Semantic Web Conference, GeoWeb, AAAI, Enterprise Data World, Text Analytics, and TTI Vanguard to name a few.
Dr. Aasman will co-present the following Graph Summit session: The Intelligent Sales Organization Runs on Speech Recognition, Knowledge Graphs and AI.

Jon Allen (San Francisco)

Jon Allen is a Senior Data Scientist at SyncThink and a Founder of / Stand-up Comedian at Cheaper Than Therapy. Jon is a physicist who studied at UT Austin’s Center for Relativity. After leaving academia, Jon worked with start-ups from MIT’s Media Lab on automated gait analysis and, later, co-founded Ravel in 2010, which specialized in large scale data solutions for corporate marketing groups. Jon moved out to the Bay Area in 2012 and has worked extensively as a data scientist in the medical and hardware spaces. He also started, runs, and regularly performs in one of the largest independent comedy clubs in the US, Cheaper Than Therapy.
Jon will present the following session: The Role of Data Science

Jesse Anderson (Reno) @jessetanderson

Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. He works with companies ranging from startups to Fortune 100 companies on Big Data. This includes training on cutting edge technologies like Apache Kafka, Apache Hadoop and Apache Spark. He has taught over 30,000 people the skills to become data engineers. He is widely regarded as an expert in the field and for his novel teaching practices. Jesse is published on O’Reilly and Pragmatic Programmers. He has been covered in prestigious publications such as The Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com.
Jesse will present the following session: Creating a Data Engineering Culture

Roger Barga (Seattle)

Roger Barga is General Manager for the New Cloud Service Initiative at Amazon. Prior to that, Roger was general manager and director of development at Amazon Web Services, where he was responsible for Kinesis data streaming services. Previously, Roger was in the Cloud Machine Learning Group at Microsoft, where he was responsible for product management of the Azure Machine Learning service. Roger is also an affiliate professor at the University of Washington, where he is a lecturer in the Data Science and Machine Learning programs. Roger holds a PhD in computer science, has been granted over 30 patents, has published over 100 peer-reviewed technical papers and book chapters, and has authored a book on predictive analytics.
Roger will present the following session: Extracting Real-Time Insights from Streaming Data

Dave Bechberger (Houston) @bechbd

Dave Bechberger is a Sr. Architect at Gene by Gene, a genetic genealogy and bioinformatics company, where he works extensively on developing their next-generation data architecture. Dave has spent his career engaging in full stack software development but specializes in building data architectures in complex data domains such as bioinformatics, oil and gas, supply chain management, etc. He uses his knowledge of graph and other big data technologies to build out highly performant and scalable systems. Dave has previously spoken at a variety of international technical conferences including NDC Oslo, NDC London, and Graph DayTexas.

Michael Berthold (Konstanz)

Michael Berthold is currently president of KNIME.com AG and co-creator of KNIME (wikipedia entry), the open analytics platform used by thousands of data experts around the world. Since August 2003, Michael has been the Nycomed-Chair for Bioinformatics and Information Mining at Konstanz University, Germany where his research focuses on using machine learning methods for the interactive analysis of large information repositories in the Life Sciences. Previously he held positions in both academia (Carnegie Mellon, UC Berkeley) and industry (Intel, Tripos).
Michael is Past President of the North American Fuzzy Information Processing Society, Associate Editor of several journals and the President of the IEEE System, Man, and Cybernetics Society. He has been involved in the organization of various conferences, most notably the IDA-series of symposia on Intelligent Data Analysis and the conference series on Computational Life Science. Together with David Hand he co-edited the textbook Intelligent Data Analysis: An Introduction which has recently appeared in a completely revised, second edition. He is also co-author of Guide to Intelligent Data Analysis (Springer Verlag) which appeared in summer 2010. When time permits Michael still writes code.

Ryan Boyd (SF Bay) @ryguyrg

Ryan Boyd (Linkedin) is a SF-based software engineer at Neo4j focused on helping developers understand the power of graph databases. Previously he was a product manager for architectural software, built applications and web hosting environments for higher education, and worked in developer relations for twenty products during his 8 years at Google. He enjoys cycling, sailing, skydiving, and many other adventures when not in front of his computer.
Ryan has been consistently one of the highest rated speakers at our conference. We're happy that he has agreed to return to Austin

Michelle Casbon (San Francisco) @texasmichelle

Michelle Casbon is a senior engineer on the Google Cloud Platform developer relations team, where she focuses on open source contributions and community engagement for machine learning and big data tools. Michelle’s development experience spans more than a decade and has primarily focused on multilingual natural language processing, system architecture and integration, and continuous delivery pipelines for machine learning applications. Previously, she was a senior engineer and director of data science at several San Francisco-based startups, building and shipping machine learning products on distributed platforms using both AWS and GCP. She especially loves working with open source projects and has contributed to Apache Spark and Apache Flume. Her writing has been featured in the AI section of O’Reilly Radar. Michelle holds a master’s degree from the University of Cambridge.
Michelle will present the following session: Kubeflow explained: Portable machine learning on Kubernetes.

Dr. Artem Chebotko (Houston) @artemchebotko

Dr. Artem Chebotko is a Solutions Architect at DataStax. His core expertise is in data modeling, data management, data mining, and data analytics. For over 15 years, he has been leading and participating in research and development projects on NoSQL, Graph, XML, Relational, and Provenance databases. He is the inventor of the Big Data Modeling Methodology for Apache Cassandra and the author of over 50 research and technical papers published in international journals and conference proceedings. He is an educator with extensive experience in both industry and academic training.
Artem will present the following graph workshop: Hands-On Introduction to Gremlin Traversals

Shannon Copeland (Atlanta)

Shannon Copeland has more than 20 years’ experience driving change and innovation in the services industry. As Chief Operating Officer of N3, Shannon is responsible for overseeing all business operations while growing the company’s profitability and cash flow. Shannon also leads N3’s innovative efforts to develop unique machine learning tools to drive operating efficiency and client results.
Before joining N3, Shannon served as a Director for Huron Consulting and as the Director of Strategy and Group COO of a global law firm leading strategy and operations throughout 18 offices in the United States, Europe, and the Middle East. Prior to that, Shannon held positions at Chevron, Deloitte, and Georgia-Pacific.
Shannon holds a bachelor’s degree in Civil Engineering from the Georgia Institute of Technology and an MBA from the Wharton School.
Shannon will co-present the following Graph Summit session: The Intelligent Sales Organization Runs on Speech Recognition, Knowledge Graphs and AI.

Sanghamitra Deb (SF Bay) @sangha_deb

Sanghamitra Deb is a Data Scientist at Chegg, where she works on problems related school and college education to sustain and improve the learning process. Her work involves recommendation systems, graph modeling, deep NLP analysis , data pipelines and machine learning. Previously, Sanghamitra was a data scientist at a Accenture where she worked on a wide variety of problems related data modeling, architecture and visual story telling. Sanghamitra is active in Data Science outreach and believes in applying analytics to a range of domains such as pharma, HR, customer support, market research, etc. Prior to being data scientist she was an astrophysicist who studied the structure of the universe by modeling galaxy clusters.
Sanghamitra will present the following session: Using weak supervision and transfer learning techniques to build knowledge graph to improve student experiences at Chegg.

Michael Freedman (New York) @michaelfreedman

Michael Freedman (Linkedin) is the cofounder and CTO of TimescaleDB, an open source database that scales SQL for time series data, and a professor of computer science at Princeton University, where his research focuses on distributed systems, networking, and security. Previously, Michael developed CoralCDN (a decentralized CDN serving millions of daily users) and Ethane (the basis for OpenFlow and software-defined networking) and cofounded Illuminics Systems (acquired by Quova, now part of Neustar). He is a technical advisor to Blockstack. Michael’s honors include the Presidential Early Career Award for Scientists and Engineers (PECASE, given by President Obama), the SIGCOMM Test of Time Award, the Caspar Bowden Award for Privacy Enhancing Technologies, a Sloan Fellowship, the NSF CAREER Award, the Office of Naval Research Young Investigator Award, a DARPA Computer Science Study Group membership, and multiple award publications. He holds a PhD in computer science from NYU’s Courant Institute and bachelor’s and master’s degrees from MIT.
Michael will present the following session: Performant time-series data management and analytics with Postgres.

Nick Gaylord (San Francisco) @texastacos

Nick Gaylord is a senior member of the data science team in Johnson & Johnson's Health Technology group, where he works on a wide range of digital health and wellness applications as well as helping to evangelize data-driven innovation across the enterprise. His previous roles include work on CRM solutions for small business owners at Womply and helping to build human-in-the-loop machine learning platforms at Figure Eight and Idibon. He has a PhD from the University of Texas at Austin, and in his spare time he fixes bikes and collaborates on work applying cognitive science to the public health domain.

Holden Karau (San Francisco) @holdenkarau

Holden Karau is a transgender Canadian, Apache Spark committer, an active open source contributor, and co-author of Learning Spark & High Performance Spark. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden talks internationally on Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of computers she enjoys dancing & playing with fire.
Holden will be presenting the following session: Understanding Spark Tuning with Auto Tuning (or how to stop your pager going off at 2am*).

Dinesh A. Joshi (SF Bay) @dineshjoshi

Dinesh A. Joshi (Linkedin) has been a professional Software Engineer for over a decade building highly scalable realtime Web Services and Distributed Streaming Data Processing Architectures serving over 1 billion devices. Dinesh is an active contributor to the Apache Cassandra codebase. He has a Masters degree in Computer Science (Distributed Systems & Databases) from Georgia Tech, Atlanta, USA.
Dinesh will be presenting the following session: Need for speed: Boosting Apache Cassandra's performance using Netty.

Chris Lu (SFBay)

Chris Lu(Linkedin / Github) is a lead engineer at Uber on building the knowledge graph, leveraging nearly twenty years of experience in big and small companies on databases, federated query, search, warehouse, and building infra for machine learning on graphs. His side projects include the SeaweedFS distributed file system, Gleam and Glow for distributed MapReduce with Golang, and DBSight for database search.
Chris will be presenting the knowledge graph session: Statistically representative graph generation and benchmarking.

William Lyon (SFBay) @lyonwj

William Lyon is a software developer at Neo4j, the open source graph database. As an engineer on the Developer Relations team, he works primarily on integrating Neo4j with other technologies, building demo apps, helping other developers build applications with Neo4j, and writing documentation. Prior to joining Neo, William worked as a software developer for several startups in the real estate software, quantitative finance, and predictive API fields. William holds a Masters degree in Computer Science from the University of Montana. You can find him online at lyonwj.com.
William will present the following graph analytics session: Operationalizing Graph Analytics With Neo4j.

Lynn Pausic (Austin) @lynnpausic

As head of the Design team at Expero, co-principal and business strategist, Lynn Pausic takes multitasking to the next level. By combining expertise in strategy, innovation and design, Lynn brings the breadth and depth of complex problems to light and figures out how to break them down into useful, usable and manageable pieces that form a holistic experience.
Lynn’s extensive background in user experience ranges from designing user interfaces for wearable devices, to creating enterprise software solutions and mobile UIs, to innovating scenarios beyond the 2D screen. She has ever-growing expertise with timely topics such as Big Data, the Internet of Things, UI Design Pattern Libraries and High-Performance Computing, in industries as varied and diverse as Austin itself. Lynn’s recent clients are in agronomy, enterprise management, energy, biotechnology and other verticals.
Prior to founding Expero, Lynn earned a B.S. from Carnegie Mellon University and worked as a Director of Product Management, a Consulting Manager and a Director of Human-Computer Interaction (HCI). At Trilogy, she led the HCI team and established user-centered design as an integral part of the company’s software development process.
Lynn often speaks on user experience and design, including at Nielsen Norman Group conferences around the world. Lynn created the popular tutorial “Complex Applications & Websites” (which she co-presents with John Morkes). Lynn also has presented at Carnegie Mellon University’s HCI Institute, Cornell University’s Media Lab and ACM’s SIGCHI conference.
Lynn will present the following session: Moving Beyond Node Views.

Josh Perryman (Bryan / College Station) @joshperryman

Josh Perryman likes to play with data. Oftentimes this is implementing proprietary algorithms closer to the data for performance or scale. Sometimes it is ad-hoc investigation and analysis, a sort of exploratory querying. A few times he’s been able to leverage his experience with data engines for dramatic performance improvements. But the real joy is designing a schema for both functionality and performance, one which increases the productivity of other developers and enables a technology to solve new problems or deliver new value to the business.
Technology isn't just data, and Josh does more than just play with data. He’s worked with high performance computing (HPC) environments, taking computations from hours to minutes or seconds. He has built visualizations which deliver new insights into complex data domains. He’s managed technology personnel, both directly and indirectly, to deliver technology solutions. Josh has put together more types of technology components, software and hardware, than can be counted, because one of his fortes is solving problems by building sustainable systems.

Karthik Ramasamy (San Francisco) @karthikz

Karthik Ramasamy (LinkedIn) is the co-founder of Streamlio - a company that focuses on building next generation real time infrastructure. Before Streamlio, Karthik was the engineering manager and technical lead for real-time infrastructure at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. He co-founded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL, that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum, where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks, where he designed and delivered platforms, protocols, databases, and high availability solutions for network routers that are widely deployed on the internet. Before joining Juniper, at the University of Wisconsin he worked extensively in parallel database systems, query processing, scale out technologies, storage engines, and online analytical systems. Several of these research projects were later spun off as a company acquired by Teradata. Karthik is the author of several publications, patents, and Network Routing: Algorithms, Protocols and Architectures. He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases.

Petra Selmer (London) @Aethelraed

Dr. Petra Selmer is a member of the Query Languages Standards and Research group at Neo4j, undertaking research into graph query languages and language standards, with the aim of evolving and standardizing property graph querying. She also supports the openCypher project at www.opencypher.org, and was previously part of the team designing and optimizing Neo4j’s Cypher query engine. For many years, she worked as a consultant and developer in a variety of different domains and roles and has a PhD in Computer Science from Birkbeck, University of London, where she researched flexible querying of graph-structured data.
Dr. Selmer will present the following Graph Summit session: GQL: Towards a Standardized Property Graph Query Language.

Juan Sequeda (Austin) @juansequeda

Dr. Juan Sequeda is the co-founder of Capsenta, a spin-off from his research, and the Senior Director of Capsenta Labs. He holds a PhD in Computer Science from the University of Texas at Austin. His research interests are on the intersection of Logic and Data and in particular between the Semantic Web and Relational Databases for data integration, ontology based data access and semantic/graph data management. Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at the 2014 International Semantic Web Conference and the 2015 Best Transfer and Innovation Project awarded by Institute for Applied Informatics. Juan is the General Chair of AMW 2018, was the PC chair of the ISWC 2017 In-Use track, is on the Editorial Board of the Journal of Web Semantics, member of multiple program committees (ISWC, ESWC, WWW, AAAI, IJCAI) and co-creator of the Consuming Linked Data Workshop series. Juan is a member of the Graph Query Languages task force of the Linked Data Benchmark Council (LDBC) and has also been an invited expert member and standards editor at the World Wide Web Consortium (W3C).

Joshua Shinavier (San Francisco) @joshsh

Joshua Shinavier is a primordial being of the graph database domain, and holds a PhD in Web science from RPI’s Tetherless World Constellation. He contributed to the first common APIs for graph databases, the original TinkerPop query language which influenced Gremlin, and the first tools which aligned the property graph and RDF data models, starting with neo4j-rdf-sail in 2008. Other graphy adventures have include Lisp hacking at Franz Inc. and Java hacking at Aurelius. As of 2017, he is part of the knowledge graph team at Uber, where he also leads a company-wide effort to unify schemas across RPC, streaming, and storage. He feels, now as ever, that the research, business, and open source communities have a lot to learn from each other with respect to graphs and knowledge representation.
Josh will present the following Graph Summit session: A Graph is a Graph is a Graph: Equivalence, Transformations, and Composition of Graph Data Models.

Michael Uschold (Seattle, WA ) @UscholdM

Michael Uschold, Senior Ontology Consultant at Semantic Arts, has over twenty-five years’ experience in developing and transitioning semantic technology from academia to industry. He pioneered the field of ontology engineering, co-authoring the first paper and giving the first tutorial on the topic in 1995 in the UK.
As a senior ontology consultant at Semantic Arts since October 2010, Michael trains and guides clients to better understand and leverage semantic technology using knowledge graphs. He has built commercial enterprise ontologies in digital asset management, finance, healthcare, legal research, consumer products, electrical devices, manufacturing and corporation registration. More recently he has focused on semantic application development using SPARQL for application code and R2RML for converting relational data into a knowledge graph.
During 2008-2009, Uschold worked at Reinvent on a team that developed a semantic advertising platform that substantially increased revenue. As a research scientist at Boeing from 1997-2008 he defined, led and participated in numerous projects applying semantic technology to enterprise challenges. He is a frequent invited speaker and panelist at national and international events, and serves on the editorial board of the Applied Ontology Journal. He received his Ph.D. in AI from Edinburgh University in 1991 and an MSc. from Rutgers University in Computer Science in 1982.
Michael will present the following Graph Summit session: Breaking Down Silos with Knowledge Graphs.

Ted Wilmes (Oklahoma City) @trwilmes

Ted Wilmes, Data Architect at Expero, is a graduate of Trinity University where he studied computer science and art history. He started his professional career at a not-for-profit research and development institution where he performed contract software development work for a variety of government and commercial clients. During this time he worked on everything from large enterprise systems to smaller, cutting edge research and development projects. One of the most rewarding parts of each of these projects was the time spent collaborating with the customer.
As Ted’s career continued, he moved on to an oil and gas startup and continued to dig deeper into the data side of software development, gaining an even deeper interest in how databases work and how to eek as much performance out of them as possible. During this time he became interested in the application of graph databases to certain problem sets. Today, at Expero, Ted enjoys putting his deep knowledge of transactional graph computing to work as he helps customers of all types navigate the burgeoning property graph database landscape.
Outside of work, Ted enjoys spending time with his family out-of-doors, listening to and playing loud music, and contributing to the Apache TinkerPop project as a committer and PMC member.
Ted will present the following Graph Summit session: High Performance JanusGraph Batch & Stream Loading.

Chris Wixon (Atlanta)

Chris Wixon, MD, is a practicing surgeon who holds the belief that improving the quality and efficiency of clinical information has the potential to have a profound effect on healthcare delivery, costs and overall outcomes.
Chris will present the following Graph Summit session: Taming of the Shrew: Using a Knowledge Graph to capture structured Health Information Data.

Dr. Mingxi Wu (Redwood City)

Dr. Mingxi Wu is the VP of Engineering at TigerGraph responsible for product development, quality assurance, and release. Mingxi excels at engineering team building and tech leadership. Previously, he worked at Ad-Tech startup Turn (acquired by Amobee), Oracle Relational Database Optimizer Group, and Microsoft SQL Server Manageability Group. He won research awards from SIGMOD, VLDB and KDD and holds patents on big data and pending patents on graph management. Mingxi received his PhD from the University of Florida, where he specialized in database and data mining.
Mingxi will present the following Graph Summit session: Eight Prerequisites of a Graph Query Language.