Who spoke at Data Day Texas 2023

Plenary Keynote
Zhamak Dehghani (San Francisco) @zhamakd

Zhamak Dehghani (Linkedin) works as the CEO and founder of a stealth tech startup reimagining the future of data platforms with Data-Mesh-native technologies to get value from data rapidly, sustainably and at scale. She founded the concept of Data Mesh in 2018 and since has been implementing the concept and evangelizing it with the wider industry. She is the author of the O'Reilly book Data Mesh.
Zhamak serves on multiple tech advisory boards. She has worked as a technologist for over 24 years and has contributed to multiple patents in distributed computing communications. She is an advocate for the decentralization of all things, including architecture, data, and ultimately power.

Zhamak will be presenting the Plenary Keynote.

Data Engineering Keynote
Adi Polak (Israel) @AdiPolak

As Vice President of Developer Experience at Treeverse, Adi Polak shapes the future of data & ML technologies for hands-on builders. She also contributes to the lakeFS open-source, a git-like interface for object stores. In her work, she brings her vast industry research and engineering experience to bear in educating and helping teams design, architect, and build cost-effective data systems and machine learning pipelines that emphasize scalability, expertise, and business goals. Adi is a frequent worldwide presenter and the author of the upcoming O'Reilly book, Machine Learning With Apache Spark. She is continually an invited member of multiple program committees and advisor for conferences like Data & AI Summit, Scale by the Bay, and others. Previously, she was a senior manager for Azure at Microsoft, where she focused on building advanced analytics systems and modern architectures. When Adi isn’t building data pipelines or thinking up new software architecture, you can find her on the local cultural scene or at the beach.

Adi will be presenting the following session:
Engineering Data Systems for the next 10 years of growth

Database Keynote
Gwen Shapira ( SF Bay Area) @gwenshap

For the last year or so, Gwen Shapira has been working on something amazing - which she will be able to share soon. Previously, Gwen was an engineering leader at Confluent, where among other things, she managed the Cloud-Native Kafka team. She has almost two decades of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. Gwen is an author of two O'Reilly books : Kafka - the Definitive Guide” and "Hadoop Application Architectures". Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or arguing about protocols, you can find her peddling on her bike exploring the roads and trails of California, and beyond.

Gwen will be presenting the following session:
Things Databases Don't Do… But Actually Should

Spark Keynote
Holden Karau ( San Francisco) @holdenkarau

Holden Karau Wikipedia / Linkedin ) is a queer transgender Canadian, Apache Spark committer, Apache Software Foundation member, and an active open source contributor. As a software engineer, she’s worked on a variety of distributed computing, search, and classification problems at Apple, Google, IBM, Alpine, Databricks, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor of mathematics in computer science. Outside of software, she enjoys playing with fire, welding, riding scooters, eating poutine, and dancing. Holden is the author of multiple O'Reilly publications, including Learning Spark, High Performance Spark, Kubeflow for Machine Learning, as well as the upcoming Scaling Spark with Dask and Scaling Spark with Ray.

Holden will be presenting the following session:
Metaprogramming — making easy problems hard enough to get promoted (w/ Spark & Friends)

Geospatial Keynote
Bonny McClain ( Greensboro, North Carolina) @datamongerbonny

Dr Bonny McClain is a geospatial analyst & self described human geographer | social anthropologist. Dr McClain applies advanced data analytics, including data engineering and geo-enrichment, to poverty, race, and gender discussions. Her research targets judgments about structural determinants, racial equity, and elements of intersectionality to illuminate the confluence of metrics contributing to poverty. Moving beyond ZIP codes to explore apportioned socioeconomic data based on underlying population data leads to discovering novel variables based on location to build more context to complex data questions. Bonny is a member of the National Press Club, 500 Women Scientists, The Urban and Regional Information Systems Association (URISA), former member of Tableau Speaker’s Bureau, and Investigational Reporters and Editors allowing access to a wide variety of health policy and health economic discussions. Bonny is author of the upcoming O'Reilly publication: Python for Geospatial Data Analysis: Theory, Tools, and Practice for Location Intelligence.

Bonny will be presenting the Geospatial Keynote:
"one ant , one bird, one tree"....

Math Keynote
Hala Nelson (Alexandria, Virginia)

Hala Nelson (Linkedin) is an Associate Professor of Mathematics at James Madison University. She has a Ph.D. in Mathematics from the Courant Institute of Mathematical Sciences at New York University. Prior to her work at James Madison University, she was a postdoctoral Assistant Professor at the University of Michigan- Ann Arbor. Her research is in the areas of Materials Science, Statistical Mechanics, Inverse Problems, and the Mathematics of Machine Learning and Artificial Intelligence. Her favorite subjects are Optimization, Numerical Algorithms, Mathematics for AI, Mathematical Analysis, Numerical Linear Algebra and Probability Theory. She likes to translate complex ideas into simple and practical terms. To her, most mathematical concepts are painless and relatable, unless the person presenting them either does not understand them very well, or is trying to show off. Other facts: Hala Nelson grew up in Lebanon, during the time of its brutal civil war. She lost her hair at a very young age in a missile explosion. This event and many that followed shaped her interests in human behavior, the nature of intelligence, and AI. Her father taught her Math, at home and in French, until she graduated high school. Her favorite quote from her father about math is, "It is the one clean science''.
Hala is author of the upcoming O'Reilly book: Essential Math for AI.

Hala will be presenting the following session:
How Math Simplifies AI.

Dr. Ying Ding (Austin)

Dr. Ying Ding is the Bill & Lewis Suit Professor of Information Technology at the University of Texas School of Information. Before that, she was a professor and director of graduate studies for data science program at School of Informatics, Computing, and Engineering at Indiana University. She has led the effort to develop the online data science graduate program for Indiana University. She also worked as a senior researcher at Department of Computer Science, University of Innsburck (Austria) and Free University of Amsterdam (the Netherlands). She has been involved in various NIH, NSF and European-Union funded projects. She has published 240+ papers in journals, conferences, and workshops, and served as the program committee member for 200+ international conferences. She is the co-editor of book series called Semantic Web Synthesis by Morgan & Claypool publisher, the co-editor-in-chief for Data Intelligence published by MIT Press and Chinese Academy of Sciences, and serves as the editorial board member for several top journals in Information Science and Semantic Web. She is the co-founder of Data2Discovery company advancing cutting edge AI technologies in drug discovery and healthcare. Her current research interests include data-driven science of science, AI in healthcare, Semantic Web, knowledge graph, data science, scholarly communication, and the application of Web technologies.

Ying will be presenting the following session:
Biomedical Knowledge Graph to Power Better AI in Health.

Data Engineering - They wrote the book.

Joe Reis and Matt Housley of Ternary Data are co-hosts of the popular Monday Morning Data Chat (Spotify / Apple) as well as the Data Nerd Herd podcast. They are also co-authors of the bestselling O'Reilly book: Fundamentals of Data Engineering.

Joe Reis (Salt Lake city)

Joe Reis (Linkedin), Co-Founder and CEO of Ternary Data, is a “recovering data scientist,” and a business-minded data nerd who’s worked in the data industry for 20 years. His responsibilities have ranged from statistical modeling, forecasting, machine learning, data engineering, data architecture, and everything else in between. Joe also teaches at the University of Utah as well as runs several meetups, including The Utah Data Engineering Meetup and SLC Python. When he’s not busy running a company, teaching, or creating content, Joe often finds himself DJing/making music, rock climbing, or trail running in the mountains around Salt Lake City, Utah.

Joe will co-present the following session:
Fundamentals of Data Engineering

Matthew Housley (Salt Lake city)

Co-Founder / CTO of Ternary Data as well as fellow “Recovering Data Scientist,” , Matthew Housley is also a “Reformed Academic,” holding a PhD in Math and dual Masters degrees in both Math and Physics. It was only natural that he began his career in Academia as a Professor of Mathematics, before joining one of the largest e-commerce companies as a data scientist. His STEM background in combination with his knack for teaching makes him a mastermind at overhauling processes, improving teamwork, and incorporating engineering best practices so that real value is delivered to companies. While making the journey from data scientist to data engineer, Matt began to focus more on data & cloud engineering, working extensively with Amazon Web Services, Google Cloud Platform, Containers, Apache Airflow and GPUs, among other technologies. Matt (or should we say, “Dr. Housley”) is an adjunct faculty member in the David Eccles School of Business at The University of Utah.

Matthew will co-present the following session:
Fundamentals of Data Engineering

Data Quality Keynote
Chad Sanderson (Seattle)

If you follow online discussions on Data Quality or Data Products, you’ve no doubt come across Chad Sanderson. He’s a featured speaker at many conferences as well as frequently requested interviewee on data podcasts. Most recently Chad was Head of Data at Convoy, where he oversaw the end-to-end data platform team — including data engineering, machine learning, experimentation, data pipeline — as well as multitude of other teams all in service of helping thousands of carriers ship freight more efficiently. Chad has built everything from feature stores, experimentation platforms, metrics layers, streaming platforms, analytics tools, data discovery systems, and workflow development platforms. He’s implemented open source, SaaS products (early and late-stage) and has built cutting-edge technology from the ground up.
Chad’s current initiative is the Data Quality Camp — the details of which he will be sharing while at Data Day Texas.

Chad will co-present the following session:
Data Contracts - Accountable Data Quality

Healthcare Data Keynote
Andrew Nguyen ( SF Bay Area)

Dr/ Andrew Nguyen has been working at the intersection of healthcare data and AI for more than a decade. He quickly discovered graph databases and has been using them to harmonize disparate data sources for nearly as long. He has worked for a variety of organizations ranging from academia to startups. Andrew is currently a medical informatics architect and leads the Data Architecture and Informatics capability for real-world data at one of the largest biopharma companies in the world, where he is designing scalable solutions to harmonize healthcare RWD sources for all levels of analytics from statistics to machine learning. Prior to his current role, he served as chair of the Department of Health Professions, and director of the MS in Health Informatics program at the University of San Francisco. He also taught classes in medical informatics, semantic interoperability, machine learning, clinical natural language processing, and biosignal/time series data analysis. Andrew holds a PhD in biological and medical informatics from the University of California, San Francisco (UCSF) and a BS in electrical and computer engineering from the University of California, San Diego (UCSD). In his spare time, he enjoys photography, hiking/backpacking, and SCUBA diving, and serves as the technical rescue coordinator for his local Search and Rescue team. Andrew is author of the recently published O'Reilly book: Hands-On Healthcare Data.

Andrew will present the following session:
AI in Healthcare : Opportunities Amid Landmines

Chat GPT Keynote
Jans Aasman (SF Bay)

Jans Aasman (Wikipedia / LinkedIn) is a Ph.D. psychologist and expert in Cognitive Science - as well as CEO of Franz Inc., an early innovator in Artificial Intelligence and provider of the graph database, AllegroGraph. As both a scientist and CEO, Dr. Aasman continues to break ground in the areas of Artificial Intelligence and Knowledge Graphs as he works hand-in- hand with numerous Fortune 500 organizations as well as US and Foreign governments. Jans recently authored an IEEE article on “Enterprise Knowledge Graphs”.
Dr. Aasman spent a large part of his professional life in telecommunications research, specializing in applied Artificial Intelligence projects and intelligent user interfaces. He gathered patents in the areas of speech technology, multimodal user interaction, recommendation engines while developing precursor technology for tablets and personal assistants. He was also a professor in the Industrial Design department of the Technical University of Delft. Dr. Aasman is a noted conference speaker at such events as Smart Data, NoSQL Now, International Semantic Web Conference, GeoWeb, AAAI, Enterprise Data World, Text Analytics, and TTI Vanguard to name a few.

Jans will present the following session: Neuro-Symbolic Story Extraction from Natural Language

NLP Keynote
John Bohannon ( SF Bay Area) @bohannon_bot

John Bohannon (Wikipedia / Linkedin ) is currently Director of Science at Primer, an artificial intelligence company headquartered in San Francisco. Bohannon worked as an award-winning journalist for over a decade, orchestrating sting operations on fraudulent academic publishers and embedding with NATO forces in Afghanistan. He is the founder of the annual Dance Your PhD contest. He has a PhD from the University of Oxford.
To see why we invited John to be our NLP Keynote for 2023, check out the following interviews:
Data Exchange Podcast (Episode 144): John Bohannon,
Multimodal, Multi-Lingual NLP at Hugging Face with John Bohannon and Douwe Kiela,
Talk with John Bohannon, Director of Science at Primer,
Trends in NLP with John Bohannon,
John Bohannon Interview - Taming arXiv with Natural Language Processing.

John will present the following session:
Data Demonology

Data Lakehouse Keynote
Bill Inmon (Castle Rock, Colorado)

Bill Inmon (Wikipedia / LinkedIn) is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine and was the first to offer classes in data warehousing. Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Bill is among the most prolific and well-known authors in the big data analysis, data warehousing and business intelligence arena. In addition to authoring more than 50 books and 650 articles, Bill has been a monthly columnist with the Business Intelligence Network, EIM Institute and Data Management Review. In 2007, Bill was named by Computerworld as one of the “Ten IT People Who Mattered in the Last 40 Years” of the computer profession.

Bill will present the following two sessions:
1. Turning your Data Lake into an Asset
2. The Architected Cloud Environment

Data ROI Keynote
Juan Sequeda (Austin) @juansequeda

Dr. Juan Sequeda is the Principal Scientist at data.world. He joined through the acquisition of Capsenta, a company he founded as a spin-off from his PhD research in Computer Science from The University of Texas at Austin. His goal is to reliably create knowledge from inscrutable data. His research and industry work has been on designing and building Knowledge Graph for enterprise data and metadata management.
Juan has researched and developed technology on semantic data virtualization, graph data modeling, schema mapping and data integration methodologies. He pioneered technology to construct knowledge graphs from relational databases, resulting in W3C standards, research awards, patents, software and his startup Capsenta acquired by data.world in 2019. Juan strives to build bridges between academia and industry as past co-chair of the LDBC Property Graph Schema Working Group, member of the LDCB Graph Query Languages task force, standards editor at the World Wide Web Consortium (W3C) and organizing committees of scientific conferences, including being the general chair of The Web Conference 2023. Juan is the co-author of the book Designing and Building Enterprise Knowledge Graphs and the co-host of Catalog and Cocktails, an honest, no-bs, non-salesy data podcast.

Juan will present the following session:
Show me the money: Practical tips for data teams to show ROI

Chris Tabb (London)

Chris Tabb started his career in the Business Intelligence/Analytics domain 30 years ago. Beginning at Cognos in the 90’s working in the back office before becoming an expert in all their products, and leaving to become an independent BI consultant in 1998. Chris has followed the evolution of the analytics industry, working hands-on with all the technologies in the ecosystems: – Databases, ETL/ELT, BI/OLAP /Visualisation Tools, Big Data Technologies, Infrastructure On premises / Cloud across many vendors, some old some new. Recently with a focus on the Modern Data Stack Evolution Chris has started many movements with a focus on Business Value using a number of hashtags to raise awareness #bringbackdatamodelling / #bringbackdatamodeling #bringbackdocumention under the umbrella of the #meandatastreets that is focused on simplification of the Data Platform architecture and to focus on Business Value.

Chris will present the following session:
The Modern Data Stack Evolution - 2023 will be the year of consolidation

Santona Tuli (Washington DC)

Santona Tuli, PhD started working with data through fundamental physics—analyzing massive event data from particle collisions at CERN. Since then, she has worked as a machine learning engineer in the NLP sector, and on product engineering for the programmatic data workflow orchestration tool Airflow. Currently at Upsolver, she works on a framework for authoring data pipelines declaratively in SQL. Dr. Tuli is passionate about building and empowering others to build end-to-end data and machine learning pipelines scalably.She has also been featured in the 3D IMAX movie Secrets of the Universe, which showcases real scientists pushing the frontiers of knowledge. In her STEM outreach work, she emphasizes representation, equity, advocacy and empowerment.

Santona will present the following session: Authoring unified batch and streaming workflows

Ryan Wisnesky (Cambridge, Massachusetts )

Ryan Wisnesky (LinkedIn) obtained B.S. and M.S. degrees in mathematics and computer science from Stanford University and a Ph.D. in computer science from Harvard University, where he studied the design and implementation of provably correct software systems. Previously, he was a postdoctoral associate in the MIT department of mathematics, where he developed the categorical query language CQL. He currently leads open-source and commercial development of CQL as CTO of Conexus AI. He maintains an active collaboration with the information-integration department of IBM Research, where he contributed to the Clio, Orchid, and HIL projects.
Ryan will present the following session: Computational Trinitarianism

Heather Hedden (Boston) @hhedden

Heather Hedden (LinkedIn) has been a taxonomist for over 26 years in various organizations and as an independent consultant. She is currently a data and knowledge engineer on the professional services team of Semantic Web Company, vendor of PoolParty software. Previously worked as a taxonomist at Cengage Learning, Gale, Viziant, First Wind, and Project Performance Corporation. Heather has designed and developed, taxonomies, ontologies, and metadata schema for internal and externally published content. She gives workshops on taxonomy creation at conferences, as corporate training, and through an independently offered online course. Heather is author of The Accidental Taxonomist.
Heather will host the following workshop: Introduction to Taxonomies for Data Scientists (workshop)
Heather will also have a booksigning / meet and greet for her book The Accidental Taxonomist at 5:10pm in Salon AB

Paige Roberts (Austin) @RobertsPaige

With two decades in the data management industry, Paige Roberts (Linkedin), has worked as an engineer, a trainer, a marketer, a product manager, and a consultant. Now, as Open Source Relations Manager at Vertica, she promotes understanding MPP data processing, open source, and how the analytics revolution is changing the world. Paige is contributor to the upcoming O'Reilly publication 97 Things Every Engineer Should know.
Paige is a total geek who is into role-playing games, LARP’ing in the SCA, Doctor Who, superheroes, space exploration, comics, Tolkien, etc. Paige writes and publishes fantasy and science fiction stories under her maiden name Paige E. Ewing. She won the Kennedy Space Center’s global Space Apps Challenge three years ago for coming up with an idea for growing food on Mars, And she's a pretty mean shot with a recurve, crossbow, or long bow.

Paige will host the following session:
Build your analytics architecture for performance and growth on a budget

Sanghamitra Deb (SF Bay) @sangha_deb

Sanghamitra Deb is a Data Scientist at Chegg, where she works on problems related school and college education to sustain and improve the learning process. Her work involves recommendation systems, graph modeling, deep NLP analysis , data pipelines and machine learning. Previously, Sanghamitra was a data scientist at a Accenture where she worked on a wide variety of problems related data modeling, architecture and visual story telling. Sanghamitra is active in Data Science outreach and believes in applying analytics to a range of domains such as pharma, HR, customer support, market research, etc. Prior to being data scientist she was an astrophysicist who studied the structure of the universe by modeling galaxy clusters.

Sanghamitra will present the following session:
Computer Vision Landscape at Chegg: Present and Future

Janet Six (DFW) @janetmsix

Janet Six is a Product Manager at Tom Sawyer Software, where she helps companies design easier-to-use products within their financial, time, and technical constraints. For her research in information visualization, Janet was awarded the University of Texas at Dallas Jonsson School of Engineering Computer Science Dissertation of the Year Award. She was also awarded the prestigious IEEE Dallas Section 2003 Outstanding Young Engineer Award. Her work has appeared in the Journal of Graph Algorithms and Applications and the Kluwer International Series in Engineering and Computer Science. The proceedings of conferences on Graph Drawing, Information Visualization, and Algorithm Engineering and Experiments have also included the results of her research.
Janet will present the following sessions:
Visualizing Connected Data as It Evolves Over Time
Where Is the Graph? Best practices for extracting data from unstructured data sources for effective visualization and analysis
#graphday #visualization

Dean Wampler (Chicago) @deanwampler

Dean Wampler is an expert in data engineering for scalable, streaming data systems and applications of machine learning and artificial intelligence (ML/AI). He is the Director of Engineering for the Accelerated Discovery Platform at IBM Research. Previously, he worked at Domino Data Lab on their data science platform, he worked on scalable ML with Ray at Anyscale, and he lead an engineering team at Lightbend developing distributed streaming data systems using Apache Spark, Apache Kafka, Kubernetes, and other tools. Dean is the author of several books, reports, and videos for O'Reilly Media, including Programming Scala, Third Edition, Fast Data Architectures for Streaming Applications, What Is Ray?, Functional Programming for Java Developers, and Programming Hive (coauthor). He is a contributor to several open source projects and a frequent conference speaker and co-organizer. Dean has a Ph.D. in Physics from the University of Washington.

Dean will present the following session: Reinforcement Learning with Ray RLlib

Ryan Boyd (Boulder) @ ryguyrg

Ryan Boyd (LinkedIn) is a Boulder-based software engineer, data + authNZ geek and technology executive. He's currently a co-founder at MotherDuck, where they're making data analytics fun, frictionless and ducking awesome. He previously led developer relations teams at Databricks, Neo4j and Google Cloud. He's the author of O'Reilly's Getting Started with OAuth 2.0.Ryan advises B2B SaaS startups on growth marketing and developer relations as a Partner at Hypergrowth Partners. Prior to leading the Google Cloud Developer Relations team, he spent 7 years at Google working on 20+ different developer products and was the co-founder of Google Code Labs which aimed to improve quality and stability of Google's developer products.Ryan graduated with a degree in Computer Science from Rochester Institute of Technology (RIT) where he later worked full-time building web applications + APIs and architecting the central web hosting platform.

Ryan will present the following session:
Your laptop is faster than your data warehouse. Why wait for the cloud? (DuckDB)

Metadata Keynote
Shirshanka Das (SF Bay Area) @shirshanka

Shirshanka Das (LinkedIn) is co-founder and CEO of Acryl Data, the company which is commercializing the open source DataHub project, a real-time metadata platform used by LinkedIn, Stripe, Pinterest, Optum, Expedia and many others. Prior to founding Acryl, he was the overall architect for Big Data at LinkedIn from 2010 to 2020, and responsible for creating the metadata and data management strategy at the company. As part of this, he founded the DataHub project and shaped its evolution to a metadata platform that powers DataOps, MLOps, productivity, and governance use cases at LinkedIn. He is also a PMC and committer on the Apache Gobblin project which manages 100PB+ of data assets at rest at LinkedIn, and is deployed in production at other large companies like Verizon, PayPal etc. Prior to LinkedIn, Shirshanka worked on high-performance serving systems at Yahoo and PayPal. Shirshanka has a Ph.D. in Computer Science from UCLA.

Shirshanka will present the following session: In Search of the Control Plane for Data.

Tomás Sobat Stöfsel (London, England) @tasabat

Tomás Sabat is the Chief Operating Officer at Vaticle. He works closely with TypeDB's open source and enterprise users who use TypeDB to build applications in a wide number of industries including financial services, life sciences, cyber security and supply chain management. A graduate of the University of Cambridge, Tomás has spent the last seven years founding and building businesses in the technology industry.
Tomás will present the following sessions:
What You Can't do With Graph Databases
Enabling the Computational Future of Biology

Artem Chebotko (Houston)

Dr. Artem Chebotko is a data professional and computer scientist with core expertise in data modeling, data engineering, data warehousing, and data analytics. For over 20 years, he has been leading and participating in research and development projects on NoSQL, Relational, Graph, Provenance, RDF, and XML databases. He is the inventor of the Big Data Modeling Methodology for Apache Cassandra and the author of over 50 peer-reviewed technical papers and presentations. He received his Ph.D. in Computer Science from Wayne State University in 2008.

Artem will present the following session: Database Schema Optimization in Apache Cassandra

Sivaram Arabandi (Houston) @ontomd

Sivaram Arabandi is a surgeon and an informaticist working at the intersection of clinical ontologies, semantic web and healthcare AI. He started ONTOPRO in 2013 and provides expertise in clinical data standards such as SNOMED, LOINC, RxNorm and ICD; building and using semantic models, text mining applications, as well as data integration and interoperability strategy to leverage structured and unstructured data for advanced big data analytics. He is currently involved with Optum Health's Clinical Decision Support (CDS) project to operationalize published clinical guidelines using ontologies and open data standards such as FHIR and CPG for point-of-care decision support. Prior to this, Sivaram was the Director for Smart Content Strategy at Elsevier and headed Clinical Terminology services responsible for developing EMMeT (Elsevier Merged Medical Taxonomy) at the core of the ClinicalKey product. He worked on Mayo Clinic's MayoExpertAdvisor (MEA) knowledge delivery tool and Knowledge Enriched Data (KED) projects integrating longitudinal clinical data spanning multiple years to decades. He was a visiting scientist at the National Library of Medicine (NLM), co-chair of the 5th International Conference on Biomedical Ontology (ICBO’14) and current co-chair for ICBO-2022. He collaborates with other researchers on the development of open clinical ontologies in areas of general medicine (OGMS), infectious diseases (IDO), newborn screening (ONSTR) and sleep medicine (SDO). He served on the External Review Board for Mayo Clinic's Knowledge Content Management System (KCMS) initiative and was a scientific advisor for Emory University’s Newborn Screening Follow-up Data Integration Collaborative (NBSDC).
Sivaram will present the following session: Ontology in Healthcare: a survey

Tim Berglund (Denver) @tlberglund

Tim Berglund is a teacher, author, and technology leader with StarTree, where he serves as the Vice President of Developer Relations. For over a decade, Tim has been a first-call speaker at conferences around the world. He can also be found on YouTube, where he has a reputation for explaining complex technology topics in an accessible way. He tweets as @tlberglund, blogs every few years at timberglund.com, and lives in Littleton, CO, USA. He has three grown children and two grandchildren, a fact about which he is rather excited.

Tim will be presenting the following session: An Introduction to Apache Pinot

Jeff Carpenter (Scottsdale, Arizona) @jscarp

Jeff Carpenter (Linkedin) , co-author of Cassandra: The Definitive Guide (3rd edition available soon!), has worked on large-scale systems in the defense and hospitality industries. Jeff leads the Developer Advocate team at DataStax, where he uses his background in system architecture, microservices and Apache Cassandra to help empower developers and operations engineers to build distributed systems that are scalable, reliable, and secure.

Zachary Carrico (Austin)

Zac Carrico is currently developing a machine learning platform for blood-based cancer diagnostics company Freenome. Before joining Freenome, Zac did machine learning engineering at job-search engine company Indeed. He is passionate about simplifying and extending the use of machine learning in biotechnology. He received his PhD from Berkeley for work on bioengineering, and loves finding ways to make machine learning research fast and reproducible.

Zac will be presenting the following session: Lessons learned adopting Ray

Brandon Baylor (Houston)

Brandon Baylor is a Systems Engineer at Chevron, where he brings the principles and processes of composition to transform the energy industry. His multi-disciplinary experience working with international business units spans design, operations, safety, software, and human systems. As a longtime engineer, he is exploring ways to design and build systems that can deal with complexity in an integrated way and at a global scale. Brandon received his B.S. in Petroleum Engineering from Marietta College and his M.S. from Massachusetts Institute of Technology, where he is a System Design & Management Fellow. Brandon is also an author. His upcoming book, A Categorical Defense of Our Future, is set to be launched this August. In it, he and his co-author imagine a new foundation for engineering and point the way toward the complete paradigm shift that is required in order to save us from ourselves. It is a firsthand story of the difficulties of living in harmony with the systems we create.
Brandon will co-present the following oil/gas/energy session:
Beyond process safety: expanding assurance capabilities and guaranteeing system safety with mathematics
#OilGasEnergy #mathematics #ETL #integration

Scott Fincher (Austin)

As an experienced data scientist, Scott Fincher routinely teaches, presents, and leads group workshops covering topics such as the KNIME Analytics Platform, Machine Learning, and the broad Data Science umbrella. He enjoys assisting other data scientists with general best practices and model optimization. For Scott, this is not just an academic exercise. Prior to his work at KNIME, he worked for almost 20 years as an environmental consultant, with a focus on numerical modeling of atmospheric pollutants. Scott holds an MS in Statistics and a BS in Meteorology, both from Texas A&M University.

Scott will co-present the following 90 minute workshop:
Escaping Excel Hell - Break Free From Old Patterns with KNIME

Dashiell Brookhart (San Francisco)

Dashiell Brookhart is currently a Data Scientist at KNIME. A graduate of the University of San Francisco's MSDS program (Master of Science in Data Science), Dashiell has carried out data science related research on prostate cancer in tandem with the University of California, San Francisco's Department of Radiation Oncology. Dashiell has also worked on a wide variety of projects related to machine learning, NLP, and deep learning.

Dashiell will co-present the following 90 minute workshop:
Escaping Excel Hell - Break Free From Old Patterns with KNIME

Max De Marzi (Chicago) @maxdemarzi

Marx De Marzi (Linkedin) is addicted to graphs. You may consider him a graph database enthusiast. He spent 8 years at Neo4j and recently made the swith to AWS Neptune. He is a blogger and an open source contributor, both activities which stem from passion: teaching people about graphs. He is always open to talk graphs, always learning, and nothing thrills him more than finding easy graph solutions to hard relational problems. He has been helping people get to the "graph epiphany" for over a decade. He is an avid graph database modeler, leveraging his knowledge of mechanical sympathy and experience to deliver dozens of graph uses cases over the years.
Max will present the following session: Outrageous ideas for Graph Databases

Yue Cathy Chang ( Sunnyvale ) @yuec

Yue Cathy Chang is an executive recognized for thought leadership and execution in digital transformation. She is passionate about addressing business challenges and often finds herself and her team "parachuting" into situations to tackle challenging and meaningful data needs. Cathy has led teams and functions at blue-chip enterprises as well as startups, across financial services and high-tech industries, working with leaders of centralized and distributed data teams, all betting the next product differentiation on data. She is currently an AVP in banking and financial services at an American multinational technology corporation.
Cathy holds MS and BS degrees in electrical and computer engineering from Carnegie Mellon University, MBA and MS degrees from MIT, and two granted US patents. She's a co-author, with Jike Chong, of the Manning publication How to Lead in Data Science.

Cathy will co-present the following Data Science sessions:
For the overwhelmed data professionals: What to do when there is so much to do?
Data Professional's Career: Techniques to Practice Rigor and Avoid Ten Mistakes?

Jike Chong (Sunnyvale) @jikechong

Jike Chong is an executive who nurtures teams and crafts cultures to produce billion-dollar business impacts. He built and grew multiple high-performing data functions in public and private companies and nurtured dozens of ambitious individual contributor data scientists into leaders; some have gone on to lead teams of more than 70 data scientists. Jike was part of the executive team that took Yiren Digital Ltd public on NYSE. He also expanded and led the data team as the chief data scientist at Acorns, designed and executed a project predicting venture investment risks at Silver Lake, and led the Hiring Marketplace Data Science team at LinkedIn, serving a business line with $4B a year in revenue.
Jike received his bachelor’s and master’s degrees in electrical and computer engineering from Carnegie Mellon University and a PhD in electrical engineering and computer science from the University of California, Berkeley. He's a co-author, with Yue Cathy Chang of the Manning publication How to Lead in Data Science.

Jike will co-present the following Data Science session:
For the overwhelmed data professionals: What to do when there is so much to do?
Data Professional's Career: Techniques to Practice Rigor and Avoid Ten Mistakes?

Justin Fine (San Clemente)

Justin Fine joined Katana Graph in September of 2021 bringing experience in big data gained at Microsoft, Neo4j, and Accenture. Justin’s background in applied math includes a lot of graph and matrix algebra for solving graph-like problems. He later worked in anti-money laundering and fraud analysis in government, banking, and telecommunications. When Justin isn't nerding he enjoys scotch, cigars, and reading with his cat Penny.

Justin will present the following Data Integration session:
Augmenting Fraud Protection Pipelines At Scale Using Graph Analytics and AI Features

James Hansen (Houston)

James Hansen works as a Wells Engineer in the Upstream function at Chevron where he employs his hybrid digital and engineering skillset to transform traditional workflows by creating tools that unite disparate processes, conducted by cross-functional teams across the globe. He leads Chevron’s Systems Engineering CoP’s book club, currently exploring the advantages of categorical systems to address complexity at scale.
Before holding the title of Wells Engineer, James held the title of Lead Field Engineer on the Deepwater Bigfoot TLP Project as well as Global Performance Engineer. During his time as a Performance Engineer, he was responsible for several initiatives that are now standard practice in the O&G industry. They include probabilistic cost and time estimation, multi-level abstraction and normalization, real-time data ingestion and analysis, business intelligence visualization, & project lifecycle management solutions. James received his B.S in Mechanical Engineering from Texas Tech University where he competed in Formula SAE & NASA’s TSGC Design Competition. James is from, and currently resides in Houston, Texas, and lives with his dog Scout.
James will co-present the following oil/gas/energy session:
Beyond process safety: expanding assurance capabilities and guaranteeing system safety with mathematics
#OilGasEnergy #mathematics #ETL #integration

David Hughes (Seattle)

David Hughes is the Principal Graph Consultant for Graphable. He has 10 years of experience designing and building graph solutions which surface meaningful insights. His background includes clinical practice, medical research, software development, and cloud architecture. David has worked in healthcare and biotech within the intensive care, interventional radiology, oncology, cardiology, and proteomics domains. He enjoys endurance running, hiking, and spending time with his family in the outdoors when he is not enabling clients to have data epiphanies from their complex data.
David will be presenting the following session: Clinical trials exploration: surfacing a clinical application from a larger Bio-Pharma KnowledgeGraph

Joey Jablonski ( Austin ) @jrjablo

Joey Jablonski (LinkedIn) is VP of Analytics at Pythian, he leads strategic engagements assisting customers in developing their data strategy, defining and executing on data governance programs and building analytical models to power the modern data-driven organization. Prior to Pythian, Joey was VP of Product at Manifold, where he brought a product mind-set is part of all engagements—allowing for delivery of value quickly in any project, and building over time to drive adoption of new data-centric capabilities in an organization. Joey led engagements across industries including high tech, pharmaceuticals and for the federal government. Before Manifold, Joey held executive leadership positions at Northwestern Mutual, iHeartMedia and Cloud Technology Partners. He brings 20+ years of experience in software engineering, high performance computing, cyber security, data governance and data engineering.

Corey Lanum (Boston) @corey_lanum

Corey Lanum (LinkedIn), has a distinguished background in graph visualization. Over the last 15 years he has managed technical and business relationships with dozens of the largest defense and intelligence agencies in North America, in addition to working with many security and anti-fraud organizations in private industry. Prior to joining Cambridge Intelligence as their US Manager, Corey was helping the customers of i2 (now IBM) and SS8 to solve their most complex graph data challenges. Corey is the author of Visualizing Graph Data from Manning Publications.
Corey will be presenting the following session: Graphing without the database - creating graphs from relational databases.
#visualization #graphday

William Lyon (SFBay) @lyonwj

William Lyon (LinkedIn / blog) is a software developer at Neo4j. As an engineer on the Developer Relations team, he works primarily on integrating Neo4j with other technologies, building demo apps, helping other developers build applications with Neo4j, and writing documentation. Prior to joining Neo, William worked as a software developer for several startups in the real estate software, quantitative finance, and predictive API fields. William holds a Masters degree in Computer Science from the University of Montana. William is author of the Manning publication Full Stack GraphQL Applications With React, Node.js, and Neo4j and co-host of the GraphStuff.FM podcast.

William will lead the following 90 minute workshop: Hands-On Introduction To GraphQL For Data Scientists & Developers

Dave McComb (Ft Collins) @semanticarts

Dave McComb (Linkedin) is the President and co-founder of Semantic Arts. He and his team help organizations uncover the meaning in the data from their information systems. Dave is also the author of "The Data-Centric Revolution", "Software Wasteland" and "Semantics in Business Systems". For 20 years, Semantic Arts has helped firms of all sizes in this endeavor, including Amgen, Dupont, Proctor & Gamble, Goldman Sachs, Schneider-Electric, Lexis Nexis, Dun & Bradstreet, and Morgan Stanley. Prior to Semantic Arts, Dave co-founded Velocity Healthcare, where he developed and patented the first fully model driven architecture. Prior to that, he was a part of the integration problem.

Dave will present the following Data Integration session:
Zero Copy Integration

Patrick McFadin (SF Bay) @patrickmcfadin

Patrick McFadin (Linkedin) is the VP of Developer Relations at DataStax, where he leads a team devoted to making users of DataStax products successful. He has also worked as Chief Evangelist for Apache Cassandra and consultant for DataStax, where he helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.

Patrick will present the following two sessions:
Cassandra on ACID: this changes everything
Your data infrastructure will be in Kubernetes

#cassandra #datainfra #dataengineering

Alex Merced (Winter Park, FL) @alexmerced

Alex Merced (Linkedin) is a Developer Advocate for Dremio with a history of creating content to enable developers of all types through his personal projects like DevNursery.com, The Web Dev 101 Podcast, and the DataNation podcast. Alex Merced has been a developer with companies like Crossfield Digital, CampusGuard, GenEd Systems and others along with being an Instructor for General Assembly Bootcamps.
Alex will present the following sessions:
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg and the Right to Be Forgotten

Jonathan Mugan (Austin) @jmugan

Jonathan Mugan (Linkedin) is a researcher specializing in artificial intelligence, machine learning, and natural language processing. His current research focuses in the area of deep learning for natural language generation and understanding. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin. His thesis was centered in developmental robotics, which is an area of research that seeks to understand how robots can learn about the world in the same way that human children do. Dr. Mugan also held a post-doctoral position at Carnegie Mellon University, where he worked at the intersection of machine learning and human-computer interaction. One of the most requested speakers at the Data Day Texas conferences, he recently also spoke on the topic of NLP at the O’Reilly AI conference, and is the creator of the O’Reilly video course Natural Language Text Processing with Python. Dr. Mugan is also the author of The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion.

Jonathan will be presenting the following session: How to build someone we can talk to.

Andy Petrella (Liège, Belgium) @noootsab

Andy Petrella is an entrepreneur with a Mathematics and Distributed Data background.Andy is an early evangelist of Apache Spark and the Spark Notebook creator in the data community. He is also author of the O'Reilly book: “What is Data Observability”, “What is Data Governance”, and trainer “Distributed Data Science”, “Data Lineage Essentials”, “Machine Learning Model Monitoring”.Andy is also the founder and CEO of Kensu, a data observability solution implementing the Data Observability Driven Development (DODD) method.

Andy will be presenting the following observability session:
How to automate data monitoring to support a scaling data strategy.

Arvind Prabhakar (SF Bay) @aprabhakar

Arvind Prabhakar is Chief Product Officer of StreamSets, a Software AG company. He has worked on data integration challenges for over 10 years. Before co-founding StreamSets, Arvind was an early employee of Cloudera, and led teams working on integration technologies such as Flume and Sqoop. A member of the Apache Software Foundation, Arvind is heavily involved in the open-source community as the PMC Chair for Apache Flume, the first PMC Chair of Apache Sqoop, and member of various other Apache projects. Prior to Cloudera, Arvind was a software architect at Informatica, where he was responsible for architecting, designing, and implementing several core systems.

Arvind will be presenting the following observability session:
A DataOps Approach to Global Data Observability.

Brent Schneeman (Austin) @schnee

Brent Schneeman swipes right for science and seeks to strengthen the scientific method muscle in whatever group he finds himself. Operating from a “lead by example” mindset, Brent frequently rolls up his sleeves and writes code to help bring predictive models to business problems. Passionate about building great teams and cultures, he’s pretty sure that a “servant leadership” posture is the right posture in his personal and professional lives.
Professionally, he tends to look after teams of data- and machine-learning-oriented contributors (analysts, scientists, and engineers) who collaborate on diverse sets of machine learning projects such as continuous optimization, customer customer churn prediction, fraud detection, and applying diverse techniques to unstructured data. Brent has worked at Vrbo, PayPal, Visa, and other small- and large-companies in individual contributor or management roles, mostly in product development organizations.
A storyteller, Brent has presented at the UT McCombs School, South By Southwest, NLP Day, multiple Data Days, and various meetups. He has one degree in Mathematics and another in Electrical Engineering and lives in Austin Texas with his wife, three kids, two cats and one dog. While he spends most of his free time mowing the lawn, he enjoys making photographs, running around downtown, and occasionally tries to make sense of neural network architectures.

Brent will present the following session: Bigger Data beats Better Math, No Question

Michael Berthold (Konstanz) @mrberthold

Michael Berthold (wikipedia), is co-founder of KNIME (wikipedia), the open analytics platform used by thousands of data experts around the world. He is currently CEO of KNIME AG and honorary professor at Konstanz University. Previously he held positions in both academia (Carnegie Mellon, UC Berkeley) and industry (Intel, Tripos). He has co-authored several books (the second edition of the Guide to intelligent Data Science appeared recently), is an IEEE Fellow, and a frequent speaker at both academic and industrial conferences. He continues to write and if time permits he still writes code - although it's usually not accepted into the KNIME code base anymore.

Michael will present the following session: Towards Data Science Design Patterns

Sean Robinson (Charlotte)

Sean Robinson is a versatile data scientist with several years of experience optimizing data processes and building intelligent data systems. Specifically, he specializes in the use of graph data science and Neo4j to abstract complex systems within a domain into a highly dimensional, interconnected knowledge graphs to uncover novel insights which would otherwise remain dormant in other data structures. Sean currently serves both as Lead Data Scientist at Graphable as well as creating and instructing new network science courses at the University of North Carolina at Charlotte’s Data Science graduate program where he instructs the next generation of data scientists on how to integrate graph data science into their toolkit.
Sean will be presenting the following workshop: Intro to Graph Data Science for Python Developers.

Michael Uschold (Seattle, WA ) @UscholdM

Michael Uschold, Senior Ontology Consultant at Semantic Arts, has over twenty-five years’ experience in developing and transitioning semantic technology from academia to industry. He pioneered the field of ontology engineering, co-authoring the first paper and giving the first tutorial on the topic in 1995 in the UK.
As a senior ontology consultant at Semantic Arts since October 2010, Michael trains and guides clients to better understand and leverage semantic technology using knowledge graphs. He has built commercial enterprise ontologies in digital asset management, finance, healthcare, legal research, consumer products, electrical devices, manufacturing and corporation registration. More recently he has focused on semantic application development using SPARQL for application code and R2RML for converting relational data into a knowledge graph.
During 2008-2009, Uschold worked at Reinvent on a team that developed a semantic advertising platform that substantially increased revenue. As a research scientist at Boeing from 1997-2008 he defined, led and participated in numerous projects applying semantic technology to enterprise challenges. He is a frequent invited speaker and panelist at national and international events, and serves on the editorial board of the Applied Ontology Journal. He received his Ph.D. in AI from Edinburgh University in 1991 and an MSc. from Rutgers University in Computer Science in 1982.
Michael will lead the following 90 minute workshop: Ontology for Data Scientists

Weidong Yang (San Francisco) @wdyang

Weidong Yang is the founder and CEO of Kineviz. He holds a doctorate in Physics and a Masters in Computer and Information Science. After conducting theoretical and experimental research on quantum dots, Weidong worked for 10 years as a product manager and R&D scientist in the Semiconductor industry where he invented Diffraction-based Overlay technology to improve the manufacturing precision of silicon wafers. He has been awarded 11 US patents and has contributed to 20+ peer review publications.
Weidong also co-founded Kinetech Arts, a non-profit organization that brings dancers and engineers together to explore the creative potential of making art via new technologies.
Weidong will lead the following session: Artistic Processing: the Untapped Power of Data Visualization