2026 website coming soon! Super Early Bird tickets available now!

The Data Day Texas 2025 Sessions

Metadata Keynote
Meta Grid - metadata management as an understanding of what already is and embracing it

The Meta Grid is an architecture for metadata that is decentralized. Therefore, it is closely connected to other decentralized architectures. Examples of these are Microservices and Data Mesh in an enterprise setting, but also federated social media on the web, such as Mastodon and BlueSky.
Distinctly for the Meta Grid, it is the single-view-of-the-world monolith that is sought dismantled, as the Meta Grid seeks to unite metadata across metadata repositories, by breaking down these hitherto unaddressed silos of metadata.
This is an introduction to a new type of decentralized architecture - the meta grid - that require little specific knowledge up front about the meta grid itself. It can be helpful for the audience to be familiar with decentralized architectures such a microservices and data mesh. Furthermore, concrete knowledge of enterprise IT landscapes is also beneficial. Finally, knowledge about data management and metadata management is also useful.

Closing AI Keynote
What Superintelligence Will Look Like

Jonathan Mugan

Superintelligence will be able to invent new knowledge and understand new situations. Current generative methods, such as the LLMs in chat models like ChatGPT, are insufficient for superintelligence because they only interpolate the human knowledge in their training data. We’ve seen flashes of superintelligence in limited and controlled domains such as chess, go, StarCraft II, protein folding, and even math problems. In more general domains, superintelligence will overcome the hallucination issues that plague current LLMs—an obstacle that even DARPA has identified as a barrier to adoption for critical applications.
Superintelligence will invent new cures for rare diseases and new scientific theories, and it will create art and entertainment personalized to your specific taste. Superintelligence will also understand novel situations, enabling it to serve as a customized teacher, explaining new concepts in terms of those that you already know. It will understand the requirements for tax and regulatory compliance and will be able to guide you through. It may even be able to install CUDA on the first try.
To get there, we must overcome the limitations of LLMs. Current neural network algorithms are too compute, memory, and power hungry. This inefficiency exists because they lack the ability to draw clean distinctions using abstractions. LLMs and associated methods also struggle with lifelong learning because they lack compiled knowledge and the pointers to reach it. These clean distinctions and pointers to compiled knowledge are benefits that come from symbolic systems. In the past, symbolic systems failed because they were unable to adapt to unexpected situations, but current LLMs can be used to dynamically write the symbolic systems on the fly to fit any new situation. This talk will cover what AI systems smarter than humans will look like, what it will take to combine neural networks and symbolic methods to achieve it, and the potential societal effects of this superintelligence.
#ai

LLM Keynote
LLMs in Production - How to Keep Them from Breaking

Vaibhav Gupta - Boundary

Deploying Large Language Models (LLMs) in production brings a host of challenges well beyond prompt engineering. Once they’re live, the smallest oversight—like a malformed API call or unexpected user input—can cause failures you never saw coming. In this talk, Vaibhav Gupta will share proven strategies and practical tooling to keep LLMs robust in real-world environments. You’ll learn about structured prompting, dynamic routing with fallback handlers, and data-driven guardrails—all aimed at catching errors before they break your application.
You’ll also hear why naïve use of JSON can reduce a model’s accuracy, and discover when it’s wise to push back on standard serialization in favor of more flexible output formats. Whether you’re processing 100+ page bank statements, analyzing user queries, or summarizing critical healthcare data, you’ll not only understand how to keep LLMs from “breaking,” but also how to design AI-driven solutions that scale gracefully alongside evolving user needs.

AI Engineering Keynote
From ML Engineer to AI Engineer

Chip Huyen

How does building with foundation models differ from traditional ML? This session, drawn from her just-released bestselling O'Reilly book, AI Engineering, will explore key shifts, including the increasing challenges of evaluating open-ended outputs and the growing integration of product and engineering.

Data Quality Keynote
Introduction to Data Contracts

Mark Freeman - Gable

Managing data quality and unexpected changes is often seen as a reactive task for data teams that often turns into fire drills-- but it doesn’t have to be that way. Companies like PayPal, GoCardless, and Convoy have successfully transformed their approach to data quality by adopting “data contracts,” an innovative architectural pattern within data engineering that addresses data quality issues upstream at the source. Mark Freeman, co-author of the upcoming O’Reilly book "Data Contracts: Developing Production-Grade Pipelines at Scale," has spent the past year collaborating with organizations to implement data contracts and refine best practices.
This session will provide an early preview of the book, and cover the essential components that make up the data contract architecture. This session will also guide the audience through a practical exploration of how data contracts can be tailored to fit a variety of data use cases, from startups to enterprise-level deployments.
Additionally, the session will provide valuable insights through real-world use cases, showcasing how companies have successfully put data contracts into production. Mark will also share strategies for overcoming common challenges, including how data teams can effectively communicate the value of data contracts and secure buy-in from leadership for critical data quality initiatives.
Learning Objectives:
By the end of the talk, learners will be able to a) define data contracts, b) understand the components of the data contract architecture, and c) discuss where data contracts can fit for various data use cases. Given that the talk is high-level architecture and use-cases of data contracts, open source tools will not be covered at this time.

90 minute workshop
Causal Graphs in Practice

Amy Hodler / Michelle Yi

This 90 minute workshop explores how to leverage the open-source ecosystem and cutting-edge technologies in causal machine learning to enhance decision-making processes. These intentionally constructed causal graphs capture the subtle interplay and influence between entities for conducting analysis. Industry leaders now harness this framework to identify directional patterns, process sophisticated logic, and establish foundations for causal inference.
See workshop page for details.

Improve your RAG pipelines with semantic re-ranking

Susan Shu Chang - Elastic

Hallucinations in Generative AI can undermine trust and accuracy. Retrieval Augmented Generation (RAG) has emerged in the last few years as a proven solution - in short, RAG works by retrieving relevant, ground truths from a knowledge base to pass to the LLM along with a prompt. However, its effectiveness hinges on retrieving the right information. If the retrieved content is irrelevant, we're back to square one. In this talk, we’ll explore re-ranking, a technique long proven effective from recommender systems, to improve the relevance of retrieved data. Join this talk to learn how to make RAG outputs more accurate, and improve your AI applications.
#ai

How to become a hero - the journey to text

Bill Inmon

Text has been the dark web to most organizations. Text requires its own technology and its own techniques. There is a learning curve that comes with text. But the rewards are tremendous. Text is unexplored and there are secrets lying in text that can make your organization more profit and gain more customers. Discovering the treasures that await you in text will be the most career elevating thing you will ever do.

Back to the basics of LLM pipeline for production

Leann Chen

This session demystifies the fundamental concepts crucial for building reliable LLM-based applications in production environments. We'll explore critical challenges like the lost-in-the-middle phenomenon, embedding strategies, and the limitations of vector search, while examining how tokenization and attention mechanisms impact real-world performance. Drawing from recent research on context windows and alignment challenges, we'll discuss popular orchestration approaches and their trade-offs, examining when complexity might outweigh benefits and address the ongoing challenges of hallucination in RAG systems. The session concludes with a demonstration of enhancing LLM trustworthiness using knowledge graphs, providing attendees with an example for building more reliable AI applications

Harvesting Trust: A Case Study on Developing Dependable Specialist Chatbots

Serg Masís

In an era where AI is transforming every aspect of every industry, the creation of trustworthy systems is not just beneficial—it's essential. This session will dissect the necessity of Quality Assistance (QA) and testing in developing trustworthy AI systems, drawing parallels with established practices in other fields, and focusing on a practical application through an agricultural chatbot for farmers. Here's what we'll cover:
- AI's Evaluation Challenge: Explore the unique challenges in evaluating AI, particularly LLMs, due to their inherent complexity and unpredictability
- Case Study Farmer’s Friend: Dive into an agricultural use case with a chatbot designed for farmers, detailing the requirements for delivering relevant, timely, and actionable information, supported by real-world examples.
- Metrics that Matter: Conclude with a robust evaluation framework, discussing metrics that assess every facet of the AI workflow—ensuring consistency and robustness, thereby underscoring the mantra, "You cannot improve what you do not measure!"
This talk aims to bridge the gap between theoretical AI development and practical, reliable applications, with a sharp focus on enhancing user trust through effective QA.

GraphRAG Keynote
The Power of GraphRAG - Successful Architectures and Patterns

Michael Hunger - Neo4j

In the last year RAG (retrieval augmented generation) (wikipedia) has established itself as a go-to architecture for GenAI applications to address challenges with LLM hallucinations and to integrate factual knowledge. This contextual information can be retrieved from vector search over text content or better from an actual knowledge graph, which would represent the rich, interconnected factual memory accompanying the language skills of the LLM.
In this talk we dive deeper into the challenges, patterns (see graphrag.com) and agentic tools for GraphRAG architectures and how they can help to build trustable and explainable GenAI solutions.

Data Mesh Keynote
Data Mesh is the Grail, Bitol is your Journey

Jean-Georges Perrin - Expedia

In the quest for modern data excellence, Data Mesh has emerged as a transformative paradigm—often imagined as the “Holy Grail” of federated data management. But the journey to implementing Data Mesh principles is fraught with challenges: how do you ensure interoperability, scalability, and governance in a distributed ecosystem? Enter Bitol, an open-source initiative by the Linux Foundation designed to simplify and standardize the path to Data Mesh success.
In this talk, Jean-Georges “jgp” Perrin, chair of the Bitol Technical Steering Committee, will explore how Bitol bridges the gap between theory and practice. You’ll discover how Bitol’s tools and standards, including the Open Data Contract Standard (ODCS) and Open Data Product Standard (ODPS), can guide organizations in establishing reliable, domain-oriented data ecosystems. Drawing parallels with the mythical quest for the Grail, this session will arm you with actionable insights to navigate the challenges of building a data-driven enterprise while ensuring alignment with Data Mesh principles.
Whether you’re an enterprise architect, a data leader, or a data enthusiast, this talk will inspire you to embrace the journey, not just the destination. Let Bitol be your compass on the road to data enlightenment.

Bridge Skills: The Hardest Problem Tech Still Can’t Solve

Eevamaija Virtanen - Helsinki Data Week

For all its breakthroughs in algorithms and AI, tech still hasn’t cracked its hardest problem: alignment. Bridge skills - the ability to connect technical execution with strategic outcomes - are more critical than ever in 2025. These skills, including systems thinking, data storytelling and cross-functional collaboration, are essential for solving tech’s toughest challenges. Yet, they remain undervalued and misunderstood. Ironically, professions like teaching, marketing, and journalism, often overlooked by the tech industry, cultivate these skills naturally. This talk explores why the future of data depends on career shifters quietly transforming teams, and why the hardest skill of all is rethinking who belongs in the tech workforce. This is the rethink tech leaders need for the future.

Special thanks to the folks at DataGalaxy for funding Eevamaija's first US speaking engagement.

Data Modeling in the Age of AI

Keith Belanger - SqlDBM

While AI can indeed be a tremendous asset, accelerating many of the routine tasks involved in data modeling, it cannot replace the strategic thinking and deep business understanding that a skilled data professional brings to the table. The art of interacting with stakeholders, understanding their unique needs, and interpreting those needs into a data model is something that, at least for now, AI cannot fully replicate. While the core concepts and patterns of data modeling have remained relatively stable over the years, the physical data platforms, formats, and volumes have changed dramatically. We now operate in a world where data is generated at an unprecedented scale and in a variety of formats, from structured relational data to unstructured text and multimedia. This shift has necessitated a corresponding evolution in how we approach data modeling. AI can—and should—be leveraged in your data modeling practice, it should be seen as a tool that enhances human capability, not a replacement for it.
In this session, Keith Belanger will discuss how to combine the speed and efficiency of AI with the insight and experience of seasoned data professionals - in a hybrid approach that will allow your organization to not only keep pace with the demands of modern data environments, but also to innovate and lead.
#datamodeling #ai

From Office Cubicles to Independent Success: How to Create a Career and Thrive as a Freelance Data Scientist

Clair Sullivan - Clair Sullivan & Associates

In a world where corporate stability is increasingly uncertain, freelancing offers data scientists a powerful way to take control of their careers and insulate themselves from layoffs. This talk will empower you to take that leap with confidence, showing you how to build a freelance practice that not only sustains but thrives, even in turbulent times. Drawing on my own journey from corporate employee to independent freelancer, I’ll share the critical steps to ensure financial stability and client consistency, from setting up the right pricing models and navigating the business logistics of company formation to developing a network that leads directly to opportunities. You’ll learn how to position yourself to avoid being just another name in a crowded job application list, instead connecting directly with clients who value your expertise and have a real need for your work.
We’ll also tackle mindset shifts—breaking away from the corporate myth of job security and embracing the control that freelancing brings over your work and income. This talk goes beyond practical tips, inspiring you to see freelancing as a viable, empowering alternative to traditional employment, enabling you to focus on work that truly excites you while offering flexibility and peace of mind. If you’re ready to take charge of your future, this talk will guide you through every step, turning freelancing from a risky idea into a secure, fulfilling reality.
#career

Context Engineering: A Framework for Data Intelligence

Andrew Nguyen - Best Buy

LLMs can process increasingly large context windows and vector databases are improving scalability - but are you preserving the critical context that matters to your business? While organizations focus on advancing AI capabilities, fundamental context loss continues plaguing our systems, silently eroding decision quality, and requiring teams to constantly reconstruct institutional knowledge.
Context engineering provides a crucial layer for modern data and AI architectures. Beyond embeddings or metadata, I introduce a structured three-layered approach for capturing and preserving the rich web of relationships that AI models need. Through operational context (real-time environmental factors), domain & subject matter expertise (practice and experience), and ontological context (formal knowledge), we now have a framework that enables us to bridge the gap between raw data and true understanding.
Learn how context engineering can enhance your AI investments by providing the structured contextual foundation they need. Join us to explore a framework that's changing how we approach context preservation in the age of AI.

Moving Beyond Text-to-SQL: Reliable Database Access through LLM Tooling

Patrick McFadin - DataStax

In the quickly evolving landscape of data engineering, traditional text-to-SQL methods often need help with non-deterministic outputs and potential errors. In this session, Patrick will explore an alternative approach: leveraging Large Language Model (LLM) tooling for direct and reliable database interactions. By integrating LLMs with databases, data engineers can achieve efficient data retrieval and manipulation without relying on intermediate SQL generation. This method enhances reliability performance and simplifies complex data workflows. Patrick will then show some of the work he has done on LangChain and LlamaIndex and the insights he gained along the way. Patrick will also review the current state of LLMs and how perfecting this methodology might be a needed survival technique.
#ai

Adopting AI in a Large Complex Organization- Aspiration vs Reality

Hala Nelson - James Madison University

Because historically Data and AI have evolved within relatively separate communities, their capabilities, benefits, and adoption strategies are valued differently by different work teams and investment decision makers. Data emerged over the last decade as the new oil, and now AI succeeded as its combustion engine, igniting data to power innovation. Many of us are currently attempting to harness the power of AI technologies at large complex organizations, or small ones for that matter. Initiatives span across a wide range of interests within an organization: AI specialists, data engineers, IT departments, strategists, ethicists, executives, and the people on the ground. How does an implementation team guide a 32,000 person institution to optimally adopt AI within the very short time attention span of an executive who wants an immediate return on investment? Is striking a deal with Microsoft Co-Pilot or OpenAI enough? Can resources be justified for the ambitious goal to create a digital twin of the processes and systems of an entire organization where AI can be applied to drive efficiencies and improvements? Are the required technologies, expertise, and resources available? In this talk I will present my experience on what it takes to move from aspiration to an implementable reality- from math to data to strategy to people to everything in between. We’ll also try to answer the question on everyone’s mind: will we eventually succeed, or will all our efforts end up in the wasteland of failed projects, efforts, funding, and time?
#ai

History and Future of Iceberg REST Catalogs

Lisa Cao - Datastrato

While Iceberg primarily concentrates on its role as an open data format for lakehouse implementation, it needs to heavily leverage its catalog for tracking tables and allowing external tools to interface with the metadata. In Iceberg 0.14.0, the community introduced the REST Open API Specification, but there is a good history into why it was developed and why the Iceberg community has decided not to provide it’s own service instead. In 2024 especially, we’ve seen many third party catalog service providers pop up instead, each with its own unique flavour- but realistically, what is the outcome we can expect from this widespread adoption? Together, we’ll review not only the history of the REST Catalog Spec, but the future of the many offshoot services it has sparked. Please note this talk is not a comparison of the catalog service providers (we’ll save that for the data discussions!) but instead the rationale on the Iceberg community to provide a spec and why everyone’s hedging their bets on Iceberg as the next standard.

All Your Base Are Belong To Us: Adversarial Attack and Defense

Michelle Yi - Women in Data

The increasing deployment of generative AI models in production environments introduces new security challenges, particularly in the realm of adversarial attacks. While visually or textually subtle, these attacks can manipulate generative models, leading to harmful consequences such as medical misdiagnoses from tampered images or the spread of misinformation through compromised chatbots. This talk examines the vulnerabilities of generative models in production settings and explores potential defenses against adversarial attacks. Drawing on insights from attacks against Vision-Language Pre-training (VLP) models, which are a key component in text-to-image and text-to-video models, this talk highlights the importance of understanding cross-modal interactions and leveraging diverse data for crafting robust defenses.
#ai

We Are All Librarians: Systems for Organizing in the Age of AI

Jessica Talisman - Adobe

Libraries have been used to collect, organize, and make human knowledge available for over 3,000 years. Over the years, practices such as collection development, curation, cataloging, serializing, recording and archiving have evolved as technology has advanced, keeping pace with the complexities of organizing for human and machine consumption. These information systems have proven to be very useful for AI, which benefits from clean, semantically structured data. What does this mean for the AI technologist? In the age of AI, we are all librarians, tasked with curating and making sense of vast amounts of data and information. As we navigate this new landscape, we would do well to learn from the expertise of librarians, who have spent centuries perfecting the art of organizing and making sense of the world's knowledge.
#ai

Empowering Change: Building and Sustaining a Data Culture from the Ground Up

Clair Sullivan - Clair Sullivan and Associates

This is a follow-up to Clair's session on Data Culture at Data Day Texas 2024.
Many organizations struggle to create a data culture that drives real business value, often facing issues such as misalignment between teams, unclear objectives, and poorly managed data practices. These challenges typically stem from simple yet correctable mistakes that, once addressed, can unlock significant potential. In this session, we’ll focus on practical steps that anyone in an organization can take, whether you are an individual contributor or a senior director, to align your teams around data-driven outcomes, improve data governance practices, and enhance collaboration across departments. You’ll learn how to influence data-driven decision-making processes, advocate for better data practices, and create an environment where data insights lead to measurable improvements. We will discuss practical approaches to champion data initiatives from within, regardless of your position, and drive meaningful change by influencing processes, communication, and shared goals. Additionally, we will explore how to build momentum for data projects by showcasing early wins and creating a feedback loop that promotes continuous improvement.
In this session, you will learn how to identify and address gaps in your team’s data culture, with a focus on driving measurable business outcomes. We’ll explore strategies to align business objectives with data insights at the team and departmental levels, making sure that data projects are closely tied to real business needs. You’ll discover ways to foster collaboration between technical and non-technical teams, ensuring that communication is clear and expectations are aligned. We will also cover how to calculate and demonstrate ROI from data initiatives, helping you build a strong case for continued investment in data-driven solutions. Ultimately, you’ll leave with practical approaches to championing data initiatives and creating a culture of continuous improvement, even without direct involvement from executive leadership.

Escape the Data & AI Death Cycle, Enter the Data & AI Product Mindset

Anne-Claire Baschet - Mirakl
Yoann Benoit - Hymaïa

On the horizon, there's a transformation underway where every digital product will encompass Data & AI capabilities.  However, we must recognize that Data and Product teams have distinct cultures and origins. Data teams possess an array of tools and technical expertise, yet they often struggle with quantifying the value they deliver. They frequently miss the mark in addressing the right problems that align with customer needs or in collaborating with Business-Product-Engineering teams.  This is where adopting a Product Mindset becomes paramount. Closing the divide between the Data and Product communities is imperative, as both groups must collaborate on a daily basis to create value for users and support businesses in reaching their goals.
In this talk, you will get insights into : identifying and overcoming the most common traps that Data Teams fall into when delivering Data & AI initiatives • Crafting impactful Data & AI Products that solve the right problems • Scaling a Data & AI Product Culture throughout the whole organization and define a Data & AI Product Strategy.
#ai

The Outcomes Economy: A Technical Introduction To AI Agentic Systems, Multi-Simulations, & Ontologies

Vin Vashishta - V Squared

Linear has given way to exponential. Digital apps are tools for agents. Data models complex and dynamical systems. The goal is models training models and building their own tools, but nothing is designed to support that today. AI platforms must follow new architectural tenets.
The AI platform roadmap must be designed to accept the realities of where businesses are today; low data maturity and resistance to change. Businesses are in a state of continuous transformation or managed decline. As Sam Altman said, “Stasis is a myth,” which means startups and SMBs have a new competitive advantage.
The speaker will deep dive into AI platforms' three primary architectural components. He’ll explain how they are constructed using real-world case studies from emerging AI platforms. This talk will touch on complex and dynamical systems modeling and where ontologies fit. The talk wraps up with a pragmatic approach to aligning technology with the business and its customers.
#ai

Elevating Data in the Business - Bring Data and AI Skills to Life

Jordan Morrow - AgileOne

In today's data and AI-driven world, data professionals can be pivotal for organizations to not only use data effectively but to be researchers and strategists. To truly excel, data professionals need to understand the business context, communicate effectively with stakeholders, and demonstrate the value of your work. This session, designed specifically for data professionals, will equip you with skills and knowledge to elevate your impact and become a true business partner. Learn how to work effectively with your stakeholders, owning and executing actionable data and business strategies, improve art of communication, and build your network to advance your career. Walk away with practical tools and techniques to increase your influence, drive business success, and stand out as a leader in your field.

The human side of data: Using technical storytelling to drive action

Annie Nelson - GitLab / Annie's Analytics

Join Annie for a session on learning the art of technical storytelling. Drawing from her background in psychology and experience as a data analyst, Annie will share strategies that go beyond just communicating data - how to influence stakeholders from the start and throughout a project’s lifecycle. Whether you're at the kickoff of a project, guiding decisions along the way, or presenting final results, the way you tell the story can have a big impact on its success.
In this session, Annie will explore a practical framework for crafting technical stories that not only explain data but also build trust, influence decision-making, and inspire action at every stage of the process. She will also provide real-world examples of how to tailor your message for both technical teams and business leaders, so you can engage all of your stakeholders effectively. You’ll leave with actionable techniques that help you drive results by tapping into an overlooked tool in data: emotion.

How to Start Investing in Semantics and Knowledge: A Practical Guide

Juan Sequeda - Data.World

What do enterprises lose by not investing in semantics and knowledge? The ability to reuse data effectively due to the lack of context and understanding of what the data means. How is AI going to use data if we don't even understand it? This is why we waste so much time, money, and lack strategic focus.
Many practitioners are already doing critical data and knowledge work, but it’s often overlooked and treated as second-class. In this talk, I will focus on practical knowledge engineering steps to start investing in semantics and knowledge and demonstrate how to elevate this data and knowledge work as a first-class citizen.
We’ll explore four key areas: communication, culture, methodology and technology. The goal is for attendees to leave with concrete steps on how to start investing in semantics and knowledge today, empowering them to be efficient and resilient.

Fundamentals of DataOps

Lisa Cao - Datastrato

While building pipeline after pipeline- we might wonder, what comes next? Automation and Data Quality, of course! Organizations today are facing complex challenges in the end-to-end deployment of data applications, from initial development to operational maintenance. This process requires seamless integration of CI/CD practices, containerization, data infrastructure, MLOps, and security measures. This session discusses strategies and a complete beginner's roadmap for groups trying to implement their own DataOps infrastructures from scratch by empowering developers, architects, and decision-makers to effectively leverage open-source tools and frameworks for streamlined, secure, and scalable ML application deployments.

Deployment at scale of an AI system based on custom LLMs : technical challenges and architecture

Arthur Delaitre - Mirakl

Mirakl is transforming seller catalog onboarding through the deployment of a scalable AI system based on custom fine-tuned Large Language Models (LLMs) and state-of-the-art multimodal models. Traditional onboarding processes can take up to two months; the new system reduces this to mere hours, efficiently handling millions of products.
This presentation will delve into the technical challenges and architectural solutions involved in deploying custom LLMs at scale. Key topics include:
• Infrastructure Deployment: Building scalable environments for LLM inference.
• Model Fine-Tuning: Customizing LLMs and quality improvements through hallucination reduction and consistency increase.
• Micro-Service Architecture: Orchestrating between model services and hosting for efficient operation. Synergies of systems containing LLMs and other ML models.
• Layered Approach: Selecting optimal results while minimizing computational costs.
Arthur will explore how these technologies are integrated into a production-ready system, discussing the strategies used to overcome scaling challenges and ensure high performance. Attendees will gain insights into deploying advanced AI systems in real-world environments, optimizing large-scale inference, and setting new industry standards in marketplace technology.
#ai

Modeling in Graph Databases

Max De Marzi - maxdemarzi.com

Modeling is a word with many meanings, we will cover two. First, how to structure your graph data to take advantage of the mechanical sympathy of most graph databases. You'll learn how to use relationship types to partition your data to speed up your queries. Second, how to model your business domain as a graph. You'll learn how to relate data so it is easier to find clusters, create often traveled paths. We'll finish with WCOJs and View capabilities of newer graph databases and how that affects both types of modeling.
#graphday #datamodeling

Automating Financial Reconciliation with Linear Programming and Optimization

Bethany Lyons - Assured Insights

Some of the knarliest data quality problems arise from the absence of relationships in data that exist in the world. Suppose you've raised multiple payment requests for $1000, $2000 and $3000. Then $6000 hits your bank account. Those three invoices should be linked to the 6k payment amount, but many systems fail to capture those links. Financial reconciliation is a mission critical operation for any organisation that accepts payments - ie every company.
Many companies perform reconciliation manually, a costly, risky, and error prone undertaking. Using mathematical techniques from the field of operations research, this whole process can be automated.
This session will take you through real world examples and challenges of such a solution, with broad applications across finance and financial services.

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, and Visual Analytics

Weidong Yang - Kineviz

Existing BI and big data solutions depend largely on structured data, which makes up only about 20% of all available information, leaving the vast majority untapped. In this talk, we introduce GraphBI, which aims to address this challenge by combining GenAI, graph technology, and visual analytics to unlock the full potential of enterprise data.
Recent technologies like RAG (Retrieval-Augmented Generation) and GraphRAG leverage GenAI for tasks such as summarization and Q&A, but they often function as black boxes, making verification challenging. In contrast, GraphBI uses GenAI for data pre-processing—converting unstructured data into a graph-based format—enabling a transparent, step-by-step analytics process that ensures reliability.
We will walk through the GraphBI workflow, exploring best practices and challenges in each step of the process: managing both structured and unstructured data, data pre-processing with GenAI, iterative analytics using a BI-focused graph grammar, and final insight presentation. This approach uniquely surfaces business insights by effectively incorporating all types of data.

Unleashing the Power of Multimodal GraphRAG: Integrating Image Features for Deeper Insights

David Hughes

GraphRAG has proven to be a powerful tool across various use cases, enhancing retrieval accuracy, language model integration, and delivering deeper insights to users. However, a critical dimension remains underexplored: the integration of visual data. How can images—so rich in contextual and relational information—be seamlessly incorporated to further augment the power of GraphRAG?
In this presentation, we introduce Multimodal GraphRAG, an innovative framework that brings image data to the forefront of graph-based reasoning and retrieval. By extracting meaningful objects and features from images, and linking them with text-based semantics, Multimodal GraphRAG unlocks new pathways for surfacing insights. From images embedded in documents to collections of related visuals, we’ll demonstrate how this approach enables more comprehensive understanding, amplifying both the depth and accuracy of insights.
#ai

WTF Is A Triple? My Journey From Neo4j To Dgraph

Will Lyon - Hypermode

As a long time Neo4j user I recently started working at Hypermode, the maintainers of the Dgraph graph database. As part of my work at Hypermode I’ve become a user of Dgraph and went through the process of trying to become an expert in a second graph database. In this talk I’d like to highlight some of my learnings comparing and contrasting these two graph databases while diving into an overview of how Dgraph fits into what we’re building at Hypermode to enable a fullstack framework for building intelligent applications alongside technologies like AI models, WebAssembly, and GraphQL.
#graph

Data Governance – It’s Time to Start Over

Malcolm Hawker - Profisee

After 20 years of trying, most Data Governance programs have failed to become anything more than a compliance check box. To realize more meaningful and impactful benefits from data governance, it’s time to start over.
If data governance is needed to realize the full value of data, then drastic changes are needed to how we approach the governance function. Companies seeking to use data governance as a lever of business transformation must:
- Transition from a rules-based system to an exception-based system
- Govern operational uses of data differently than analytical uses of data
- Quantify the value of data governance across both operational and analytical uses
- Integrate incentives to motivate governance behaviors and offset costs of governance
- Jettison outdated frameworks and approaches, including the misguided idea of data ‘ownership’.
Join Malcolm Hawker, the CDO of Profisee Software, as he shares his vision for a new approach to data governance that focus on incentives and business value, and not controls.

Optimisation Platforms for Energy Trading

Adam Sroka - Hypercube

As the energy sector transitions to new technologies and hardware, the data requirements are undergoing significant changes. At the same time, the markets in which energy systems operate are also evolving - giving traders and energy teams a vastly more complex set of options against which they need to make decisions.
The move to real-time data for BESS system operation and the addition of multiple markets makes optimisation of revenue for storage assets intractable for human operation alone.
In this talk, Adam Sroka will walk through one solution deployed at a leading BESS trading company in the UK that aligned probabilistic forecasting and stochastic methodologies with a linear optimisation engine to determine the best markets, prices, and trades for any given portfolio of mixed energy and storage assets.
Adam will walk through an architecture diagram for a system that integrates real-time, near real-time, and slow-moving data with AI-driven forecasts and the complexities of optimisation management.

Towards a sensory system for AI agents

Alex Dean - Snowplow

The world is witnessing a fundamental shift, moving beyond the cloud era of Software-as-a-Service (SaaS) toward a new paradigm: Service-as-a-Software, powered by agentic applications enabled by GenAI.
Agentic applications leverage the power of reasoning AI, shifting from quick, pre-trained responses to thoughtful, real-time decision-making (known as “System 2” thinking). This ability to reason through complex situations allows AI agents to perform more sophisticated tasks across industries, ranging from customer support to cybersecurity, and opens up entirely new business models.
To operate within an environment, agentic applications will follow the OODA loop conceived of by Colonel John Boyd: Observe, Orient, Decide, Act. This approach allows AI agents to break down complex situations into a simple flywheel, and most importantly to self-learn and adapt to the changing environment.
In this talk Alex will dive into the core building blocks required for a sensory system for AI agents. Drawing on analogies from natural biology, Alex will set out the architectural principles of this new system, including how it relates to agent cognition and perception. One of the key topics is how these agents can self-learn through schema acquisition and semantic/ontological bootstrapping.
This is not a landscape review, but Alex will also touch on some promising sensory technologies, from Apache Paimon and Lance, to Flink, to GraphRAG and the vector stores.
This talk is for everybody who wants to break their agentic applications out of the lab into the live environment. Just as humans rely on sensory input to make informed decisions, AI agents need their own "senses" to interpret the world around them.

Ontologies vs Ologs vs Graphs

Ryan Wisnesky - Conexus

Ontologies are seeing renewed interest as 'semantic applications' such as LLMs proliferate. In this talk Ryan will go over the hundred+ year history of ontologies, including their origin in taxonomies, the rise and fall of the semantic web, how knowledge graphs have been serving as lightweight ontologies and how computational extensions to knowledge graphs turn them into ontologies proper. He also describe the present of ontologies, including connections to formal logic and relational algebra, and including 'ologs' - a category-theoretic kind of ontology that uniquely admits a way to bidirectionally exchange data between ontologies. Next he will discuss lambda-graph, a type theoretic kind of ontology that allows results such as type inference to be put to work on graph data, with applications to Tinkerpop. Ryan will conclude by looking towards the future, describing how ontologies can guide data integration and warehousing and how they can add context to prompts to increase LLM accuracy.
#graph

The Force multiplier effect. How data platform foundations drive efficiency

Chris Tabb - LEIT Data

This presentation explores how building a solid data platform foundation acts as a force multiplier, significantly accelerating your ability to deliver results. We'll delve into the importance of an architect-led data platform program delivered in an agile methodology based on a Business-aligned architecture runway. Discover how a well-designed data platform can amplify your team's efficiency and reduce delivery friction (The Force Multiplier Effect). We'll discuss the crucial role of an architect in providing the architecture runway for the data platform strategy and ensuring alignment with business needs (Architect-led Program). Embrace the power of agile methodologies to build your data platform iteratively but aligned to the roadmap, focusing on delivering high-value features quickly and at scale (Agile Delivery). Learn how to break down your data platform development into manageable products and features, enabling faster iteration and adaptation. (Product & Feature Approach). By using this approach you can build a data platform that empowers your organization to: Deliver insights and applications faster Reduce development time and costs Increase agility and adaptability How to create the force multiplier effect.

Validating LLM-Generated SQL Code: A mathematical approach

Ryan Wisnesky - Conexus

Organizations using LLMs (Large Language Models) to generate SQL code face a significant hurdle: ensuring the generated code is reliable and safe to execute. Unforeseen errors in the code can lead to unpredictable behavior, ranging from minor inconveniences to catastrophic data loss. This lack of trust becomes a major roadblock in deploying LLM-based applications. In this talk we describe a technology that leverages advanced mathematics to rigorously analyze LLM-generated SQL code. The analysis goes beyond basic syntax checks, delving into complex logic and potential unintended consequences. For example, the analysis can detect missing join conditions. Once checked, LLM-generated SQL code can be deployed with assurance. In this talk we go through an in depth example of a validation scenario, and describe the formal methods required to build such a verifier at scale.

The Future of Data Education and Publishing in the Era of AI

Jess Haberman - Anaconda
Michelle Yi - Women in Data
Hala Nelson - James Madison University

With easier access to expert knowledge, we are in the midst of a significant shift in the technical education and publishing landscapes. Do these advancements propel us toward educational bliss or do they pose unprecedented threats to industry and academia? Join us as we see to unravel the future of data and tech education.
The surge of generative AI content sparks a range of debates: Does it herald a new era of learning or threaten academic integrity? Will AI augment or overshadow human-generated educational materials? What implications does the proliferation of AI-generated content hold for authors and the discoverability of their work? Does democratized access to generative AI writing tools make our writing better and more efficient, or simply more generic? We will delve into the ramifications of AI tools on writing, teaching, and student learning, exploring the opportunities they present for knowledge dissemination and the concerns they raise with regard to content quality and correctness. Join us for a discussion on the future of data education and its transformative impact on the realms of technology, academia, and publishing.

Our esteemed panelists bring education and publishing perspectives:
Hala Nelson: Associate Professor of Mathematics at James Madison University and author of Essential Math for AI.
Michelle Yi: Board Member at Women in Data and advocate for STEM education among underrepresented minorities.
Jess Haberman (panelist and moderator): Content and education leader at Anaconda, leveraging 14 years of publishing experience, including as an acquisitions editor at O’Reilly Media.

The Data Day Texas 2025 Sessions

Metadata Keynote Meta Grid - metadata management as an understanding of what already is and embracing it

Closing AI Keynote What Superintelligence Will Look Like

LLM Keynote LLMs in Production - How to Keep Them from Breaking

AI Engineering Keynote From ML Engineer to AI Engineer

Data Quality Keynote Introduction to Data Contracts

90 minute workshop Causal Graphs in Practice

Improve your RAG pipelines with semantic re-ranking

How to become a hero - the journey to text

Back to the basics of LLM pipeline for production

Harvesting Trust: A Case Study on Developing Dependable Specialist Chatbots

GraphRAG Keynote The Power of GraphRAG - Successful Architectures and Patterns

Data Mesh Keynote Data Mesh is the Grail, Bitol is your Journey

Bridge Skills: The Hardest Problem Tech Still Can’t Solve

Data Modeling in the Age of AI

From Office Cubicles to Independent Success: How to Create a Career and Thrive as a Freelance Data Scientist

Context Engineering: A Framework for Data Intelligence

Moving Beyond Text-to-SQL: Reliable Database Access through LLM Tooling

Adopting AI in a Large Complex Organization- Aspiration vs Reality

History and Future of Iceberg REST Catalogs

All Your Base Are Belong To Us: Adversarial Attack and Defense

We Are All Librarians: Systems for Organizing in the Age of AI

Empowering Change: Building and Sustaining a Data Culture from the Ground Up

Escape the Data & AI Death Cycle, Enter the Data & AI Product Mindset

The Outcomes Economy: A Technical Introduction To AI Agentic Systems, Multi-Simulations, & Ontologies

Elevating Data in the Business - Bring Data and AI Skills to Life

The human side of data: Using technical storytelling to drive action

How to Start Investing in Semantics and Knowledge: A Practical Guide

Fundamentals of DataOps

Deployment at scale of an AI system based on custom LLMs : technical challenges and architecture

Modeling in Graph Databases

Automating Financial Reconciliation with Linear Programming and Optimization

GraphBI: Expanding Analytics to All Data Through the Combination of GenAI, Graph, and Visual Analytics

Unleashing the Power of Multimodal GraphRAG: Integrating Image Features for Deeper Insights

WTF Is A Triple? My Journey From Neo4j To Dgraph

Data Governance – It’s Time to Start Over

Optimisation Platforms for Energy Trading

Towards a sensory system for AI agents

Ontologies vs Ologs vs Graphs

The Force multiplier effect. How data platform foundations drive efficiency

Validating LLM-Generated SQL Code: A mathematical approach

The Future of Data Education and Publishing in the Era of AI

Metadata Keynote
Meta Grid - metadata management as an understanding of what already is and embracing it

Closing AI Keynote
What Superintelligence Will Look Like

LLM Keynote
LLMs in Production - How to Keep Them from Breaking

AI Engineering Keynote
From ML Engineer to AI Engineer

Data Quality Keynote
Introduction to Data Contracts

90 minute workshop
Causal Graphs in Practice

GraphRAG Keynote
The Power of GraphRAG - Successful Architectures and Patterns

Data Mesh Keynote
Data Mesh is the Grail, Bitol is your Journey