Who spoke at Data Day Texas 2020

Confirmed Speakers

Opening Keynote
Jesse Anderson (Reno) @jessetanderson

Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. He works with companies ranging from startups to Fortune 100 companies on Big Data. This includes training on cutting edge technologies like Apache Kafka, Apache Hadoop and Apache Spark. He has taught over 30,000 people the skills to become data engineers. He is widely regarded as an expert in the field and for his novel teaching practices. Jesse is published on O’Reilly and Pragmatic Programmers. He has been covered in prestigious publications such as The Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com.
Jesse will present the following data science session: Working Together as Data Teams.
On the day before (Friday), Jesse will also be hosting a full day Professional Kafka Development Workshop. You can register for the class here.

AI Health Data Keynote
Dr. Ying Ding (Austin)

Dr. Ying Ding is the Bill & Lewis Suit Professor of Information Technology at the University of Texas School of Information. Before that, she was a professor and director of graduate studies for data science program at School of Informatics, Computing, and Engineering at Indiana University. She has led the effort to develop the online data science graduate program for Indiana University. She also worked as a senior researcher at Department of Computer Science, University of Innsburck (Austria) and Free University of Amsterdam (the Netherlands). She has been involved in various NIH, NSF and European-Union funded projects. She has published 240+ papers in journals, conferences, and workshops, and served as the program committee member for 200+ international conferences. She is the co-editor of book series called Semantic Web Synthesis by Morgan & Claypool publisher, the co-editor-in-chief for Data Intelligence published by MIT Press and Chinese Academy of Sciences, and serves as the editorial board member for several top journals in Information Science and Semantic Web. She is the co-founder of Data2Discovery company advancing cutting edge AI technologies in drug discovery and healthcare. Her current research interests include data-driven science of science, AI in healthcare, Semantic Web, knowledge graph, data science, scholarly communication, and the application of Web technologies.
Professor Ding will present the AI health Data keynote session: Knowledge Graph for Drug Discovery.

Human in the Loop Keynote
Robert Munro (San Francisco ) @WWRob

Robert Munro (LinkedIn) is an expert in combining Human and Machine Intelligence, working with Machine Learning approaches to Text, Speech, Image and Video Processing. Robert has founded several AI companies, building some of the top teams in Artificial Intelligence. He has worked in many diverse environments, from Sierra Leone, Haiti and the Amazon, to London, Sydney and Silicon Valley, in organizations ranging from startups to the United Nations. In addition to publishing more than 50 papers, Robert is the author of the upcoming Manning publication Human in the Loop Machine Learning. He has a PhD from Stanford University.
Rob will present the following Human in the Loop keynote: Human Centered Machine Learning.

Database Keynote
Dr. Marko A. Rodriguez (Santa Fe) @twarko

Dr. Marko A. Rodriguez (LinkedIn) is a graph and stream computing specialist currently focused on designing stream-based virtual machines for processing graph-based structures within distributed computing environments. Marko is the co-founder of Apache TinkerPop where he is developing the next generation TinkerPop4 virtual machine and bytecode specification that will enable the natural integration of any data processor and query language. Marko is also the founder of RReduX which, along with developing TinkerPop4, is designing a universal distributed computer called GMachine. Dr. Rodriguez received his Ph.D. in computer science from the University of California at Santa Cruz and was a Director's Fellow at the Center for Nonlinear Studies at the Los Alamos National Laboratory.
Dr. Rodriguez will present the following session: mm-ADT : A Multi-Model Abstract Data Type

Knowledge Graph Keynote
Juan Sequeda (Austin) @juansequeda

Dr. Juan Sequeda is the co-founder of Capsenta, a spin-off from his research, and the Senior Director of Capsenta Labs. He holds a PhD in Computer Science from the University of Texas at Austin. His research interests are on the intersection of Logic and Data and in particular between the Semantic Web and Relational Databases for data integration, ontology based data access and semantic/graph data management. Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at the 2014 International Semantic Web Conference and the 2015 Best Transfer and Innovation Project awarded by Institute for Applied Informatics. Juan is the General Chair of AMW 2018, was the PC chair of the ISWC 2017 In-Use track, is on the Editorial Board of the Journal of Web Semantics, member of multiple program committees (ISWC, ESWC, WWW, AAAI, IJCAI) and co-creator of the Consuming Linked Data Workshop series. Juan is a member of the Graph Query Languages task force of the Linked Data Benchmark Council (LDBC) and has also been an invited expert member and standards editor at the World Wide Web Consortium (W3C).
Juan will present the following session: A Brief History of Knowledge Graph's Main Ideas

TinkerPop Keynote
Joshua Shinavier (San Francisco) @joshsh

Joshua Shinavier is a primordial being of the graph database domain. As a co-founder of what is now Apache TinkerPop, he contributed to the first common APIs for graph databases, the original TinkerPop query language which influenced Gremlin, and the first tools which aligned the property graph and RDF data models, starting with neo4j-rdf-sail in 2008. At Uber, he leads the company-wide effort to unify data models and schemas across RPC, streaming, and storage. The scope of this effort includes developing standardized schemas, propagating standardized schemas throughout the company's infrastructure, developing mappings to integrate data across languages and environments, and getting as much as possible of Uber's data connected in the form of a graph of entities and relationships, facilitating data discovery and automated query planning. Joshua holds a PhD in computer science from RPI's Tetherless World Constellation, where he took the opportunity to explore the strange no man's land between graphs, cognition, and augmented reality. He feels, now as always, that the research, business, and open source communities have a lot to learn from each other with respect to graphs and knowledge representation.
Josh will present the following TinkerPop Keynote: TinkerPop 2020.

Jans Aasman (SF Bay)

Jans Aasman (Wikipedia / LinkedIn) is a Ph.D. psychologist and expert in Cognitive Science - as well as CEO of Franz Inc., an early innovator in Artificial Intelligence and provider of the graph database, AllegroGraph. As both a scientist and CEO, Dr. Aasman continues to break ground in the areas of Artificial Intelligence and Knowledge Graphs as he works hand-in- hand with numerous Fortune 500 organizations as well as US and Foreign governments. Jans recently authored an IEEE article on “Enterprise Knowledge Graphs”.
Dr. Aasman spent a large part of his professional life in telecommunications research, specializing in applied Artificial Intelligence projects and intelligent user interfaces. He gathered patents in the areas of speech technology, multimodal user interaction, recommendation engines while developing precursor technology for tablets and personal assistants. He was also a professor in the Industrial Design department of the Technical University of Delft. Dr. Aasman is a noted conference speaker at such events as Smart Data, NoSQL Now, International Semantic Web Conference, GeoWeb, AAAI, Enterprise Data World, Text Analytics, and TTI Vanguard to name a few.
Dr. Aasman will present the following session: Creating Explainable AI With Rules.

Jeff Carpenter (Scottsdale, Arizona) @jscarp

Jeff Carpenter (Linkedin) , co-author of Cassandra: The Definitive Guide (3rd edition available soon!), has worked on large-scale systems in the defense and hospitality industries. Jeff leads the Developer Advocate team at DataStax, where he uses his background in system architecture, microservices and Apache Cassandra to help empower developers and operations engineers to build distributed systems that are scalable, reliable, and secure.
Jeff will be giving the following Cassandra presentation: Cassandra 4.0 In Action

Sanghamitra Deb (SF Bay) @sangha_deb

Sanghamitra Deb is a Data Scientist at Chegg, where she works on problems related school and college education to sustain and improve the learning process. Her work involves recommendation systems, graph modeling, deep NLP analysis , data pipelines and machine learning. Previously, Sanghamitra was a data scientist at a Accenture where she worked on a wide variety of problems related data modeling, architecture and visual story telling. Sanghamitra is active in Data Science outreach and believes in applying analytics to a range of domains such as pharma, HR, customer support, market research, etc. Prior to being data scientist she was an astrophysicist who studied the structure of the universe by modeling galaxy clusters.
Sanghamitra will present the following session: How to start your first computer vision project.

Justin Fine (Los Angeles)

Justin Fine is based in Los Angeles, CA and is a Sales Engineer working mainly in the SoCal region. His academic background is applied mathematics and has worked with graphs for over 12 years in many different verticals while consulting (federal, telecoms, financial, etc). During this time as a consultant his focus was mainly advanced analytics utilizing NoSQL technologies. He recently comes from Microsoft's Azure team where he was a Data Solution architect and is very excited to be part of the Neo4j family! When Justin isn't nerding he enjoys scotch, cigars, and reading with his cat Penny.
Justin will co-present the following session: Graph Feature Engineering for More Accurate Machine Learning (90 minute workshop)

Alon Gavra (Israel) @sangha_deb

Alon Gavra is a platform team lead at AppsFlyer. Originally a backend developer, he’s transitioned to lead the real time infrastructure team and took on the role of bringing some of the most heavily used infrastructure in AppsFlyer to the next level. A strong believer in sleep-driven design, Alon’s main focus is stability and resiliency in building massive data ingestion and storage solutions.
Alon will present the following Data Engineering session: Managing your Kafka in an explosive growth environment.

Anna Lisa Gentile (San Jose ) @anligentile

Dr. Anna Lisa Gentile (LinkedIn) is a Researcher at IBM Research Almaden. Her research is principally focused on studying methods and techniques for semantic annotating unstructured and semi-structured content. Her main Research Areas are Information Extraction (IE), Natural Language Processing (NLP) and Semantic Web. She obtained her PhD with a thesis on Named Entity Disambiguation at the University of Bari, Italy in 2010. She has published more than 60 peer-reviewed scientific publications including papers at major venues such as LREC, EMNLP, ESWC and ISWC. She has been serving as Organizing Committee member for conferences such as ISWC, ESWC, WWW amongst many others and organized workshop series such as LD4IE on Linked Data for Information Extraction and HumBL on Augmenting Intelligence with Bias-Aware Humans- in-the-Loop.
Dr. Gentile will present the following Human in the Loop session session: Information Extraction with Humans in the Loop.

Tanner Gilligan (SF Bay)

Tanner Gilligan has a passion for both AI and software architecture, and combines these skills in order to lead the platform development at Sculpt. Originally, Tanner grew up in Minnesota where he worked full time during high-school, and moved to California for college and better weather. He completed his B.S. and M.S. in Computer Science (AI) at Stanford University in only four years, and graduated with distinction for being at the top of his class. Prior to Sculpt, he gained hands-on AI experience at TrueCar, Oracle, and several clients while working as a ML consultant.
Tanner will co-present the following Human in the Loop session: From Stanford to Startup: making academic human-in-the-loop technology work in the real world.

Tyler Glaittli (Midvale, Utah)

Tyler Glaittli is a Business Systems Analyst on the Enterprise Data Management team at CHG Healthcare. He's passionate about bridging the gap between business users and technology and has a knack for simplifying and modeling complex systems. With help from Graphileon, he is taming, visualizing, and revealing the complex relationships in 40 years of healthcare staffing data. Outside of work, he's the General Manager of Random Tangent Improv Comedy, a nonprofit that teaches and performs the art of improvised comedy.
Tyler will co-present the following Global Graph Summit session: Managing Relationships in the Healthcare Industry with Graphileon: A CHG Healthcare Use Case.

Abe Gong (San Francisco ) @AbeGong

Abe Gong (LinkedIn) is CEO and cofounder at Superconductive Health. A seasoned entrepreneur, Abe has been leading teams using data and technology to solve problems in healthcare, consumer wellness, and public policy for over a decade. Previously, he was chief data officer at Aspire Health, the founding member of the Jawbone data science team, and lead data scientist at Massive Health. Abe holds a PhD in public policy, political science, and complex systems from the University of Michigan. He speaks and writes regularly on data science, healthcare, and the internet of things.
Abe will co-present the following Data Engineering session: Fighting pipeline debt with Great Expectations.

Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell

Dr. Denise Gosnell leads a team at DataStax which builds some of the largest, distributed graph applications in the world. Her passion centers on examining, applying, and evangelizing the applications of graph data and complex graph problems. As an NSF Fellow, Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research coined the concept of "social fingerprinting" by applying graph algorithms to predict user identity from social media interactions.​ ​Since then, Dr. Gosnell has built, published, patented, and spoke on dozens of topics related to graph theory, graph algorithms, graph databases, and applications of graph data across all industry verticals.
Dr. Gosnell will be hosting the following session: Modeling, Querying, and Seeing Time Series Data within a Self-Organizing Mesh Network.

Sony Green (San Francisco)

Sony Green is co-founder and Director of Business Development at Kineviz. After receiving his BFA in sculpture from the Rhode Island School of Design, he studied 3D animation at the Vancouver Film School and went on to work for LucasArts, Salesforce, the CDC, Yahoo, IGN, and multiple tech startups. He brings Kineviz’ clients and partners a cross-disciplinary approach to data visualization and analytics.
Sony will be hosting the following session: Crime Analysis with Visual Graph Transformation.

Sijie Guo (San Francisco) @sijieg

Siejie Guo (Linkedin / GitHub) is the founder and CEO of StreamNative. StreamNative is a data infrastructure startup offering a cloud native event streaming platform based on Apache Pulsar for enterprises. Previously, he was the tech lead for the Messaging Group at Twitter and worked on push notification infrastructure at Yahoo. He is also the VP of Apache BookKeeper and PMC Member of Apache Pulsar.
Sijie will be hosting the following session: Building a streaming data warehouse using Flink and Pulsar..

Brian Hall (Austin) @brian_w_hall

Brian Hall (linkedin) leads the Graph and Analytics Practice at Expero, with consulting expertise across a wide array of graph engines including JanusGraph, DataStax, Neo4j, TigerGraph, Neptune, and CosmosDB.
Brian’s focus has always been in the application of technology to concrete business problems. He started his career with “fingers on keyboards”, writing code for government agencies, financial markets and commercial software companies. Now he enjoys consulting for clients to help them apply technology and processes to make their businesses better. He has a B.S. in Computer Science from Vanderbilt University and an M.S. in Computer Science from DePaul University.
In their free time, Brian and his wife, Nicole, enjoy watching their kids slowly turn into the adults they will become, traveling, eating good food with good wine, and staying active in Austin with all the outdoor activities and great live music.
Brian will present the following session: Scaling Your Cassandra Cluster For Fluctuating Workloads.

Ethan Hasson (Austin )

Ethan Hasson (LinkedIn) is a Senior Software Engineering Consultant with a focus on front end technologies and architecture. He has professionally built frontend applications for a number of years at SnapTrends, HumanGeo, and Expero. When not building applications professionally, he enjoys building video games, playing games of all types, going on hikes, and camping.
Ethan will be co-presenting the following session: Building a Graph User-Interface for Malware-Analysis.

Amy Hodler (Kettle Falls, Washington) @amyhodler

Amy Hodler is a network science devotee and AI and Graph Analytics Program Manager at Neo4j. She promotes the use of graph analytics to reveal structures within real-world networks and predict dynamic behavior. She is the co-author of the O’Reilly book, Graph Algorithms: Practical Examples in Apache Spark and Neo4j. Amy helps teams apply novel approaches to generate new opportunities at companies such as EDS, Microsoft, Hewlett-Packard (HP), Hitachi IoT, and Cray Inc. Amy has a love for science and art with a fascination for complexity studies and graph theory.
Amy will present the following sessions:
Responsible AI Requires Context and Connections.

Graph Feature Engineering for More Accurate Machine Learning (90 minute workshop)

Rick Houlihan (Austin)

Rick Houlihan is a principal technologist and leads the NoSQL blackbelt team at AWS and has designed hundreds of NoSQL database schemas for some of the largest and most highly scaled applications in the world. Many of Rick’s designs are deployed at the foundation of core Amazon and AWS services such as CloudTrail, IAM, CloudWatch, EC2, Alexa, and a variety of retail internet and fulfillment-center services. Rick brings over 25 years of technology expertise and has authored nine patents across a diverse set of technologies including complex event processing, neural network analysis, microprocessor design, cloud virtualization, and NoSQL technologies. As an innovator in the NoSQL space, Rick has developed a repeatable process for building real-world applications that deliver highly efficient denormalized data models for workloads of any scale, and he regularly delivers highly rated sessions at re:Invent and other AWS conferences on this specific topic.
Michael will present the following Data Engineering and Architecture session: Where’s my lookup table? Modeling relational data in a denormalized world.

Sanjay Joshi (Seattle)

Sanjay Joshi (Linkedin) is the Industry CTO, Healthcare at Dell EMC. Based in Seattle, Sanjay's career has spanned the entire gamut of life-sciences and healthcare from clinical and biotechnology research to healthcare informatics to medical devices.
A "skunkworks" engineer, bioengineer and informaticist, he defines himself as a "non-reductionist" with a "systems view of the world.” His current focus is a systems-level understanding of Healthcare from the Edge to the Cloud via Genomics, Proteomics, Microbiomics, Imaging and IoT processes and data infrastructures.
Recent experience has included AI platforms, data management and instruments for Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA validations; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional flow cytometry analytics. He began his career developing and building X-Ray machines. Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond. Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He completed several medical school and PhD level courses (in Sydney and Seattle).
Sanjay will be presenting the following session: Time-Series analysis in healthcare: A practical approach.

Devangana Khokhar (Bengaluru)

Devangana Khokhar (Linkedin) is a senior data scientist and strategist at ThoughtWorks. She brings 6+ years of experience in building intelligent systems and defining data strategy for clients across multiple domains and geographies. Devangana has a research background in theoretical computer science, information retrieval, and social network analysis. She’s written a book on network sciences titled Gephi Cookbook (Packt Publishing London). Her interests include data privacy and security, the role of data in humanitarian sector, ethics and responsibilities around data, reinforcement learning, and data-driven intelligence in low-resource settings. Devangana frequently consults for and guides nonprofit organizations and social enterprises on the value of data literacy and holds workshops and boot camps on various dimensions of data. She earned her master’s degree in theoretical computer science specializing in social network analysis from PSG College of Technology, Coimbatore, India.
Devangana will be presenting the following Texas AI Summit session: Data Governance and FATTER AI.

Brad Klingenberg (San Francisco)

Brad Klingenberg is the Chief Algorithms Officer at Stitch Fix, an online personal styling service that helps people find the clothes they love through personalized shipments of apparel, shoes and accessories. Brad and his team uses statistics, machine learning and human-in-the loop algorithms to optimize the Stitch Fix client experience, the management of inventory and the selection of items for clients. Prior to joining Stitch Fix, Brad received his PhD in Statistics from Stanford University and worked as a data scientist in technology and financial services.
Brad will present the featured Human in the Loop session: Humans, machines and disagreement: Lessons from production at Stitch Fix.

Sean Knapp (SF Bay) @seanknapp

Sean Knapp is the founder and CEO of Ascend.io.
Prior to Ascend.io, Sean was a co-founder, CTO, and Chief Product Officer at Ooyala. At Ooyala Sean played key roles in raising $120M, scaling the company to 500 employees, Ooyala's $400m+ acquisition, as well as Ooyala's subsequent acquisitions of Videoplaza and Nativ. He oversaw all Product, Engineering and Solutions, as well as defined Ooyala's product vision for their award winning analytics and video platform solutions
Before founding Ooyala, Sean worked at Google where he was the technical lead for Google's legendary Web Search Interface team, helping that team increase Google revenues by over $1B. Sean also developed and launched iGoogle, the company's popular, customizable home page. Sean has both B.S. and M.S. degrees in Computer Science from Stanford University.
Sean will be presenting the following Data Engineering session: How Declarative Configurations and Automation Can Prevent A Data Mutiny.

Brad Knox (Austin)

Brad Knox co-leads the Bosch Learning Agents Lab, which is housed at UT Austin and focuses on the development of machine learning algorithms for autonomous driving. Brad was an early pioneer of human-in-the-loop reinforcement learning, the topic of his PhD dissertation at UT. His postdoctoral research at the MIT Media Lab focused on creating interactive characters through machine learning on puppetry-style demonstrations of interaction. Before joining Bosch, Brad founded and sold his startup Bots Alive, working in the toy robotics sector. He has won multiple best paper awards, the 2012 best dissertation award for the UT Austin Department of Computer Science, and was named to IEEE Intelligent Systems' AI 10 to Watch in 2013.
Brad will present the following Human in the Loop session : Learning sequential tasks from human feedback.

Clay Lambert (Cheyenne, Wyoming)

Clay Lambert is a Confluent Certified Developer, a team leader at Expero Inc. and an in-the-trenches technical expert. He has been developing custom software solutions in a broad range of industries for over 34 years, for industry leaders in Aerospace, Real Estate, Oil and Gas, Messaging (Co-developer of the ColdSpark Mail engine, the first Mail Transport router and Application Server), Medical, Telecom, Broadband, Commodity Trading, and much more. For the past 24 years his emphasis has been in providing solutions in the highly scalable, highly available, fault tolerant, massively distributed computing space, focused primary in JAVA as well as other prominent technology stacks. With special expertise in complex database, content management, Big Data, high performance networking, and infrastructure.
Clay will present the following Data Engineering session : Stateful Streaming Application Integration - Leveraging Kafka Streams Processor API with KSQL DB.

Dale Markowitz (Austin)

Dale Markowitz is an Applied AI Engineer and Developer Advocate at Google, where she tries to make machine learning tools for developers as intuitive and easy-to-use as possible. Before that, she worked as a software engineer in Google Research and before that at the online dating site OkCupid. She holds a degree in Computer Science from Princeton University.
Dale will present the following ML/AI session : Shining a Light on Dark Documents.

Timo Mechler (DFW)

Timo Mechler is a Product Manager and Architect at SmartDeployAI. He has close to a decade of financial data modeling experience working both as an analyst and strategist in the energy commodities sector. At SmartDeployAI he now works closely with the engineering teams to solve interesting data modeling challenges.
Timo will co-present the following session : Creating Cloud-Native Machine Learning Workflows on Kubernetes using ScyllaDB Backend as Persistent Storage.

Charles Adetiloye (DFW)

Charles Adetiloye is a Lead ML platforms engineer at SmartDeployAI. He has well over 15 years of experience building large-scale, distributed applications. He has always been interested in building distributed Event-Driven systems that are composable from independent asynchronous subsystems. He has extensive experience working with Kubernetes, and NoSQL databases like ScyllaDB and Cassandra.
Charles will co-present the following session : Creating Cloud-Native Machine Learning Workflows on Kubernetes using ScyllaDB Backend as Persistent Storage.

Rob McDaniel (Seattle)

Rob McDaniel is the co-founder and CTO at Sigma IQ, where he leads the development of the world's first fully machine-learned matching engine for enterprise-scale account reconciliation. He cut his teeth with startups during the height of the dot-com crash (oops!) where he and his brother successfully bootstrapped and sold enterprise network solutions into major ISPs. Rob then spent 10 years at Microsoft working on Windows Phone and Excel, before moving on to build machine learning systems at PayScale and local startups. Most recently, Rob managed Applied Sciences at Rakuten, where his team unified R&D across international markets and taxonomies. He also once worked in a lollipop factory.
Raised in a family of engineers, Rob grew up with computers and was the only one of his friends with a bang path. Despite a brief and confused romance with physics (and alpine climbing), ultimately it was his love of math which drove him to discover machine learning, where he fell in love with NLP and semantics, and ultimately graphs and topologies, which of course is where all roads lead because #GraphsAreEverywhere.
He loves difficult challenges, both physical and mental. He is the kind of person who says “please” and “thank you” to computers, and in his spare time he enjoys studying math and linguistics, and building things.
Rob will present the following session: Immutable Data Pipelines for Fun and Profit.

Gian Merlino (San Francisco) @gianmerlino

Gian Merlino is CTO and co-founder of Imply, a San Francisco-based technology company, and a committer on Apache Druid. Previously, Gian led the data ingestion team at Metamarkets (now a part of Snapchat) and held senior engineering positions at Yahoo. He holds a BS in Computer Science from Caltech.
Gian will present the following Data Engineering session: Fresh and Fast with Apache Druid: Analytics for Real-time and Ad-hoc Applications.

Jonathan Mugan (Austin) @jmugan

Jonathan Mugan (Linkedin) is a researcher specializing in artificial intelligence, machine learning, and natural language processing. His current research focuses in the area of deep learning for natural language generation and understanding. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin. His thesis was centered in developmental robotics, which is an area of research that seeks to understand how robots can learn about the world in the same way that human children do. Dr. Mugan also held a post-doctoral position at Carnegie Mellon University, where he worked at the intersection of machine learning and human-computer interaction. One of the most requested speakers at the Data Day Texas conferences, he recently also spoke on the topic of NLP at the O’Reilly AI conference, and is the creator of the O’Reilly video course Natural Language Text Processing with Python. Dr. Mugan is also the author of The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion.
Jonathan will be presenting the following TensorFlow session: Moving Your Machine Learning Models to Production with TensorFlow Extended

Emanuel Ott (Hannover, Germany)

Emanuel Ott leads iMerit’s solutions team. Over the past 6 years, he and iMerit’s US/India-based teams have been fueling the world’s leading augmented reality, natural language processing, and self-driving car companies with human annotated ground truth to power their efforts to “see and understand the world”. Emanuel's ceaseless curiosity for innovative technological solutions to human problems has led him to redefine client annotation requirements, improve their data pipelines, and empower them to leverage and hone the power of HITL workflows involving iMerit's workforce of nearly 3,000 in-house employees.
Emanuel will present the following Human in the Loop session: How to trust your Human-In-The-Loop (HITL) data annotations.

Arvind Prabhakar (SF Bay) @aprabhakar

Arvind Prabhakar is cofounder and CTO of StreamSets, provider of the industry’s first DataOps platform for modern data integration. He’s an Apache Software Foundation member and a PMC member on Flume, Sqoop, Storm, and MetaModel projects. Previously, Arvind held many roles at Cloudera, ranging from software engineer to director of engineering.
Arvind will present the following Data Engineering session: Deploying DataOps for analytics agility.

Jennifer Prendki (San Francisco) @jlprendki

Jennifer Prendki is the founder and CEO of Alectio and has spent a large part of her career promoting the importance of creating a better approach to Machine Learning Lifecycle Management. Her current focus is on helping ML teams build better models with less data. Prior to founding Alectio, she was the VP of Machine Learning at Figure Eight, one of the industry leader in data labeling (recently acquired by Appen); she also headed Machine Learning at Atlassian and various Data Science initiatives on the Search team at Walmart Labs. She is also known for her active support of women in STEM and Technology.
Jennifer will present the following Human in the Loop session: Cost-Optimized Data Labeling Strategy.

Karthik Ramasamy (San Francisco) @karthikz

Karthik Ramasamy (LinkedIn) is the co-founder of Streamlio - a company that focuses on building next generation real time infrastructure. Before Streamlio, Karthik was the engineering manager and technical lead for real-time infrastructure at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. He co-founded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL, that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum, where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks, where he designed and delivered platforms, protocols, databases, and high availability solutions for network routers that are widely deployed on the internet. Before joining Juniper, at the University of Wisconsin he worked extensively in parallel database systems, query processing, scale out technologies, storage engines, and online analytical systems. Several of these research projects were later spun off as a company acquired by Teradata. Karthik is the author of several publications, patents, and Network Routing: Algorithms, Protocols and Architectures. He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases.
Karthik will co-present the following session: AI/ML Model Serving using Apache Pulsar Functions
and the following session: Using interactive Querying of Streaming Data for Anomaly Detection

Clark Richey (Laurel, Maryland)

Clark Richey is the Chief Technology Officer at FactGem. He has over 20 years of experience designing and developing software, primarily for the defense and intelligence sectors. He has also taught in the master’s program at Loyola University and undergraduate program at UMBC. Clark has investigated non-traditional methods and technologies that use data more efficiently for over 10 years.
Clark will present the following session: Data Journeys: From Document DB to RDF to Property Graph
Clark will also be presenting at the Graph Showcase on Friday.

Paige Roberts (Austin) @RobertsPaige

In two decades in the data management industry, Paige Roberts (Linkedin), Open Source Relations Manager at Vertica, has worked as an engineer, a trainer, a marketer, a product manager, and a consultant. Now, she promotes understanding of Vertica, MPP data processing, open source, and how the analytics revolution is changing the world.
Paige is a total geek who is into role-playing games, LARP’ing in the SCA, Doctor Who, superheroes, space exploration, comics, Tolkien, etc. Paige writes and publishes fantasy and science fiction stories under her maiden name Paige E. Ewing. She won the Kennedy Space Center’s global Space Apps Challenge three years ago for coming up with an idea for growing food on Mars, And she's a pretty mean shot with a recurve, crossbow, or long bow.
Paige will present the following Data Analytics session: Architecting Production IoT Analytics.

Shawn Rutledge (Seattle)

Shawn Rutledge is an accomplished machine learning practitioner with over a decade of experience building analytics solutions across verticals as diverse as financial services, travel, and social media. He is currently Chief Scientist at Sigma IQ, an early-stage Fintech startup. Before that, he was Principal Scientist at kFold Enterprises, a machine learning products and services firm he founded in 2009. While he has served as a technology executive for IBM, First Data, and Expedia, he is most at home in the startup environment and has been a principal contributor, leader, advisor, and angel investor to more than a dozen Seattle area startups. Shawn holds a bachelor's degree in Computer Science and has completed graduate coursework in Statistics at Stanford.
Shawn will present the following Texas AI Summit session: Machine Learning Counterclockwise.

Dr. Bivin Sadler (Dallas)

Originally from Dallas Texas, Dr. Bivin Sadler finished a BS in mathematics magna cum laude from Texas Tech University before beginning his professional career in Scottsdale, Arizona, at Motorola. He worked as a statistician and software engineer for 2.5 years, working primarily on a companywide tool to predict when software projects could be released with optimal statistical properties (Six Sigma). Upon completion of the project, he moved to San Diego, and while playing professional beach volleyball for two years, finished a master’s degree in applied math at San Diego State University. He then moved back to Dallas to earn a PhD in statistics from SMU and finished his degree in 2014 after winning the Walsh Award for the top score on the qualifying exam taken after the third year of coursework.
Dr. Sadler was hired as part of the faculty at SMU after graduation and began a dual appointment teaching both undergraduate and graduate classes in the statistics department and online with the recently formed Master of Science in Data Science (MSDS) program. Academically, he has presented his work in item response theory at various conferences and is currently working on several domestic and international consulting projects. He became a full-time member of the MSDS faculty in August 2018 and, in addition to consulting projects and teaching, actively contributes towards developing new courses and enhancing existing ones at the SMU MSDS program.
Dr. Sadler will present the following session: Deep Learning and the Analysis of Time Series Data.

Shioulin Sam (New York, NY)

Shioulin Sam is a research engineer at Cloudera Fast Forward Labs, where she bridges academic research in machine learning with industrial applications. Previously, she managed a portfolio of early stage ventures focusing on women-led startups and public market investments and worked in the investment management industry designing quantitative strategies. She holds a PhD in electrical engineering and computer science from the Massachusetts Institute of Technology.
Shoulin will present the following session: Learning with limited labeled data.

Brent Schneeman (Austin) @schnee

Brent Schneeman swipes right for science and seeks to strengthen the scientific method muscle in whatever group he finds himself. Operating from a “lead by example” mindset, Brent frequently rolls up his sleeves and writes code to help bring predictive models to business problems. Passionate about building great teams and cultures, he’s pretty sure that a “servant leadership” posture is the right posture in his personal and professional lives.
Professionally, he tends to look after teams of data- and machine-learning-oriented contributors (analysts, scientists, and engineers) who collaborate on diverse sets of machine learning projects such as continuous optimization, customer customer churn prediction, fraud detection, and applying diverse techniques to unstructured data. Brent has worked at Vrbo, PayPal, Visa, and other small- and large-companies in individual contributor or management roles, mostly in product development organizations. He currently is attempting to make the world safe for machine learning with Alegion.
A storyteller, Brent has presented at the UT McCombs School, South By Southwest, NLP Day, multiple Data Days, and various meetups. He has one degree in Mathematics and another in Electrical Engineering and lives in Austin Texas with his wife, three kids, two cats and one dog. While he spends most of his free time mowing the lawn, he enjoys making photographs, running around downtown, and occasionally tries to make sense of neural network architectures.
Brent will present the following DataSci / ML session: Bigger data vs. better math: which is most effective in ML?.

Rosaria Silipo (Zürich ) @DMR_Rosaria

Rosaria Silipo (LinkedIn), Principal Data Scientist at KNIME, is the author of 50+ technical publications, including her most recent book “Practicing Data Science: A Collection of Case Studies”. She holds a doctorate degree in bio-engineering and has spent 25+ years working on data science projects for companies in a broad range of fields, including IoT, customer intelligence, the financial industry, and cybersecurity.
Rosaria will present the following session: Practicing data science: A collection of case studies.

Abraham Starosta (SF Bay)

Abraham Starosta is passionate about democratizing AI. Prior to Sculpt, He enjoyed researching NLP, multitask learning and weak supervision at Stanford. He has built text intelligence products for financial institutions at Primer AI, and co-founded the technical recruiting agency for startups Human Capital. He finished his B.S and M.S in Computer Science from Stanford specializing in AI and NLP.
Abraham will co-present the following Human in the Loop session: From Stanford to Startup: making academic human-in-the-loop technology work in the real world.

Gaja Vaidyanatha (Austin, Texas)

Gaja Vaidyanatha is a seasoned data practitioner with a 27+ year proven track record of managing and integrating large data footprints, on-premises and on the Cloud, across verticals such as Finance, Banking, Retail, Healthcare, High-Tech, Govt. and Utilities in the Americas, Europe and Asia. He is passionate about data integration and building cloud-native serverless data pipelines for Analytics, Machine Learning and AI. Most recently, he headed Data Services for the Global Private Banking division of HSBC.
Gaja will present the following Data Engineering session: Serverless Data Integration

Heidi Waterhouse (Minneapolis) @wiredferret

Heidi Waterhouse (Linkedin / Medium) is a developer advocate with LaunchDarkly. She delights in working at the intersection of usability, risk reduction, and cutting-edge technology. One of her favorite hobbies is talking to developers about things they already knew but had never thought of that way before. She sews all her conference dresses so that she’s sure there is a pocket for the mic.
Heidi will be giving the following presentation: The Death of Data: 2019.

Ryan Wisnesky (Cambridge, Massachusetts )

Ryan Wisnesky (LinkedIn) obtained B.S. and M.S. degrees in mathematics and computer science from Stanford University and a Ph.D. in computer science from Harvard University, where he studied the design and implementation of provably correct software systems. Previously, he was a postdoctoral associate in the MIT department of mathematics, where he developed the categorical query language CQL. He currently leads open-source and commercial development of CQL as CTO of Conexus AI. He maintains an active collaboration with the information-integration department of IBM Research, where he contributed to the Clio, Orchid, and HIL projects.
Ryan will present the following database session: Theory for the Working Database Programmer

Corey Zumar (SF Bay )

Corey Zumar (LinkedIn) is a software engineer at Databricks, where he’s working on machine learning infrastructure and APIs for model management and production deployment. Corey is also an active contributor to MLflow. He holds a master’s degree in computer science from UC Berkeley. At UC Berkeley’s RISELab, he was one of the lead developers of Clipper, an open source project and research effort focused on high-performance model serving.
Corey will present the following Data Engineering and Architecture session: MLflow: An open platform to simplify the machine learning lifecycle.

Thomas Cook (Austin) @Datta_Boy

Thomas Cook is Director of Sales at Cambridge Semantics. He has a Masters in Computer Science from Texas State University and brings 20+ years experience in Software Engineering, ETL, Data Warehousing and Big Data. Prior to joining Cambridge Semantics, Thomas worked at SAS, IBM, Netezza and Talend.
Thomas will present the following 90 minute workshop: Intro to RDF/SPARQL and RDF*/SPARQL*

Chris Davis (Dallas) @phoo

Dr. Chris Irwin Davis is a professor of computer science at the University of Texas at Dallas who teaches database theory and design. He also has 15 years of experience working for Fortune 500 companies in data management and software development lifecycle.
Chris will present the following session: Automated Encoding of Knowledge from Unstructured Natural Language Text into a Graph Database

Jonathan Ellis (Austin) @spyced

Jonathan Ellis is CTO and co-founder at DataStax. Prior to DataStax, Jonathan worked extensively with Apache Cassandra while employed at Rackspace. Prior to Rackspace, Jonathan built a multi-petabyte, scalable storage system based on Reed-Solomon encoding for backup provider Mozy.
Jonathan will be presenting Cassandra keynote for the Distributed Data track: The Next Five Years in Databases.

Michael Grove (Washington, DC) @mikegrovesoft

Michael Grove is VP of Engineering and co-founder of Stardog where he oversees the development of the Stardog Knowledge Graph Platform. Michael studied Computer Science at the University of Maryland and is an alumnus of its well-regarded MIND Lab which specialized in semantic technologies. Before Stardog, he worked at Fujitsu Resarch on the use of graphs and semantic technologies in pervasive computing environments. Michael is an expert in large scale database and reasoning systems and has worked with graphs and graph databases for nearly fifteen years.
Mike will present the following Global Graph Summit talk: How to use a Knowledge Graph to Improve the Search Experience

Stefan Hausotte (Bochum, North Rhine-Westphalia) @_secana_

Stefan Hausotte is the team lead for “Automated Threat Analysis” at G DATA, where he plans and coordinates the development of automated malware analysis and classification tools. He is an active open source committer in different projects related to security, .NET and software development areas. Furthermore he teaches IT-Security at the Technical University of Dortmund and is a frequent speaker about security related topics at conferences and fairs. He believes that a graph is the natural representation of the different interconnections between malware and malicious actors and a graph database is the right approach as an underlying technology for efficient malware analysis at large scale.
Stefan will co-present the following talk: Building a Graph User-Interface for Malware-Analysis


Sam Jacob (Houston)

Sam Jacob is a Software Engineer, Technical Consultant and Application Architect with over 20 years of experience working in insurance, oil & gas, energy trading, retail and software industries. He has been building applications for Expero using a mixture of Apache Spark, DataStax Graph and JanusGraph, in conjunction with common RDBMSs. This work frequently leveraged various message brokering and java middleware technologies.
Sam will be co-presenting the following session: Securing your Analytics Environment at Scale

Rick Paiste (Houston)

Rick Paiste brings 30 years of solutions experience to Expero as Managing Consultant and Architect. Working closely with the architecture and development team, he drives integration between disparate systems, technologies and company cultures. His boots-on-the-ground approach to tackling complex database design and implementation complements high-level strategizing for data modeling and integration. Rick’s industry expertise spans oil and gas exploration, power and broadband and commodity trading to name a few.
Prior to joining the team at Expero, Rick spent more than a decade leading a boutique consultancy specializing in large-scale software development for the energy industry. Again working at both the strategic and tactical levels, he was responsible for developing and executing product portfolio roadmaps, integrating enterprise systems and managing a broad range of stakeholders.
Rick will be co-presenting the following session: Securing your Analytics Environment at Scale

Victor Lee (Kent, Ohio)

Dr. Victor Lee is Director of Product Management at TigerGraph, overseeing all its product lines. He brings a strong academic background, decades of industry experience, and a commitment to quality and service. Victor was a circuit designer and technology transfer manager at Rambus, before returning to school for his computer science doctorate, focusing on graph data mining. He received his BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University, and PhD in Computer Science from Kent State University. Before joining TigerGraph in 2015, he was a visiting professor at John Carroll University.
Dr. Lee will be giving the following 90 minute workshop: Using Graphs to Improve Machine Learning and Produce Explainable AI

Josh Perryman (Bryan / College Station) @joshperryman

Josh Perryman likes to play with data. Oftentimes this is implementing proprietary algorithms closer to the data for performance or scale. Sometimes it is ad-hoc investigation and analysis, a sort of exploratory querying. A few times he’s been able to leverage his experience with data engines for dramatic performance improvements. But the real joy is designing a schema for both functionality and performance, one which increases the productivity of other developers and enables a technology to solve new problems or deliver new value to the business.
Technology isn't just data, and Josh does more than just play with data. He’s worked with high performance computing (HPC) environments, taking computations from hours to minutes or seconds. He has built visualizations which deliver new insights into complex data domains. He’s managed technology personnel, both directly and indirectly, to deliver technology solutions. Josh has put together more types of technology components, software and hardware, than can be counted, because one of his fortes is solving problems by building sustainable systems. Josh is currently Director of Product Development, Graph Solutions at VeracityID.

Stefan Plantikow (Berlin)

Stefan Plantikow is the Project Lead and Editor for the next generation declarative Graph Query Language GQL (ISO/IEC 39075). He works at Neo4j, the leading property graph database company as a Standards Expert and Product Manager in the Query Language Standards and Research team that is undertaking research into graph query languages and language standards, with the aim of evolving the state of the art of property graph querying.
Stefan's background is in Computer Science with a focus on distributed systems and transaction processing. In the past, he has worked on enterprise application integration, large-scale climate data management, and scalable overlay networks. At Neo4j, he played key roles in the design of the Cypher graph query language and the openCypher project, the development of the first cost-based planner for property graph databases and pioneered the architecture of Cypher for Apache Spark and Neo4j Morpheus. Stefan is passionate about computer language design, how languages as a medium enable access to new technology, and related topics, as well as continuously exploring how to expand the scope and applicability of graph technology in a way that makes it easily accessible to users. Stefan is currently based in Berlin, Germany.
Stefan will present the following session: GQL: Get Ready for a Standard Graph Query Language

Steve Sarsfield (London) @stevesarsfield

Steve Sarsfield, VP of Product at Cambridge Semantics / AnzoGraph, is a long time industry expert with experience at Talend, Vertica and now Cambridge Semantics. He is also author of the book The Data Governance Imperative. Steve has more than 20 years of experience in databases, analytics, information quality, big data and data governance.
Steve will co-present the following session: Case Study: AnzoGraphDB at Parabole.ai.

Semih Salihoglu (Waterloo, Ontario) @phoo

Semih Salihoglu is an assistant professor at University of Waterloo's Cheriton School of Computer Science. He is a member of the Data Systems Research Group. Dr. Salihoglu does both systems and theoretical research in data management and processing. His systems work focuses on developing systems for managing, querying, or doing analytics on graph-structured data. His main on-going systems projects include Graphflow, which is a new graph database his team are building from scratch, and GraphWrangler which is a system designed to give an immediate graph-view on relational data. His theoretical work focuses on studying theoretical aspects of distributed algorithms for query processing.
Semih Salihoglu will be presenting the following Global Graph Summit session: Query Processor of GraphflowDB and Techniques for the Graph Databases of 2020s.

Michael Uschold (Seattle, WA ) @UscholdM

Michael Uschold, Senior Ontology Consultant at Semantic Arts, has over twenty-five years’ experience in developing and transitioning semantic technology from academia to industry. He pioneered the field of ontology engineering, co-authoring the first paper and giving the first tutorial on the topic in 1995 in the UK.
As a senior ontology consultant at Semantic Arts since October 2010, Michael trains and guides clients to better understand and leverage semantic technology using knowledge graphs. He has built commercial enterprise ontologies in digital asset management, finance, healthcare, legal research, consumer products, electrical devices, manufacturing and corporation registration. More recently he has focused on semantic application development using SPARQL for application code and R2RML for converting relational data into a knowledge graph.
During 2008-2009, Uschold worked at Reinvent on a team that developed a semantic advertising platform that substantially increased revenue. As a research scientist at Boeing from 1997-2008 he defined, led and participated in numerous projects applying semantic technology to enterprise challenges. He is a frequent invited speaker and panelist at national and international events, and serves on the editorial board of the Applied Ontology Journal. He received his Ph.D. in AI from Edinburgh University in 1991 and an MSc. from Rutgers University in Computer Science in 1982.
Michael will present the following 90 minute workshop: Ontology for Data Scientists.

Ted Wilmes (Oklahoma City) @trwilmes

Ted Wilmes, Data Architect at Expero, is a graduate of Trinity University where he studied computer science and art history. He started his professional career at a not-for-profit research and development institution where he performed contract software development work for a variety of government and commercial clients. During this time he worked on everything from large enterprise systems to smaller, cutting edge research and development projects. One of the most rewarding parts of each of these projects was the time spent collaborating with the customer.
As Ted’s career continued, he moved on to an oil and gas startup and continued to dig deeper into the data side of software development, gaining an even deeper interest in how databases work and how to eek as much performance out of them as possible. During this time he became interested in the application of graph databases to certain problem sets. Today, at Expero, Ted enjoys putting his deep knowledge of transactional graph computing to work as he helps customers of all types navigate the burgeoning property graph database landscape.
Outside of work, Ted enjoys spending time with his family out-of-doors, listening to and playing loud music, and contributing to the Apache TinkerPop project as a committer and PMC member.
Ted will co-present the following session: JGTSDB: A JanusGraph/TimescaleDB Mashup


Martin Fowler of Thoughtworks holding a "fireside chat" for the Data Day 2019 audience.


Perennial Data Day favorite, Holden Karau, presenting the latest on Spark at DDTX19.


Jonathon Morgan, CEO of New Knowledge, discussing how to build a data science team, at DDTX19.