Laid off? In between positions? Take advantage of our Open-To-Work discount.

The Data Day Texas 2026 Sessions

We are continuing to publish the sessions info. Expect to see many session abstracts added over the next few weeks.

Learning Beyond Language: A New Geometric Paradigm for Better ML

Alexandra Pasi

“Bigger is better” has been a dominating philosophy in machine learning throughout the prior decade. This principle has manifested itself in a variety of ways: in the promise that bigger datasets would deliver better model performance, and in the focus on increasingly complex model-ensembling techniques. These directions have produced many impressive results in specific contexts and benchmarks, while failing to produce innovations which address the fundamental question of model generalizability outside the training set. This approach has been associated with a number of ubiquitous practical challenges to deploying models in production, including frequent retrainings, slow model inference, and large/cumbersome models. New foundational approaches to the mathematical underpinnings of machine learning can improve model accuracy in production, as well as improving the compactness and speed of machine learning models. These novel approaches involve finding new ways of representing and leveraging the inherent geometry of the data.

Letter from a fish: Computing, objectivity and the limits of AI. A response to Bill Inmon

Ole Olesen-Bagneux

The fundamental binary logic of computing - 1 or 0 - sets very hard boundaries for what technology can achieve, in the field of semantics.
The groundbreaking work of the Scottish mathematician George Boole combined Aristotelian logic with equational expressions to demonstrate that formal reasoning was possible to think of through a mathematical lens. That turned into Boolean Logic.
Ultimately, Boole’s thinking ended up defining the logical ports that make up circuits in computer hardware.
All bits in all computers travel through this logic, and as such, the Boolean logic is the physical limit of how data can be processed. And therefore, the very infrastructure through which we process semantics technologically, is hardwired to formal reasoning.
That is the boundary.
In his 2025 talk at Data Day Texas, computer scientist and author Bill Inmon applied his thinking on data to the world of text. As a figure of speech, Bill called himself a human that lived on land - the world of data. The world of text, however, was water, and he could not express his thinking as if he was a fish, he had to express himself as a human of data. And that’s how the textual warehouse came into being - a surprising, functional evolution of the data warehouse.
But what happens, when the world of text approaches the world of data? If fish suddenly should make their way onto land, to the world of data? What will they see, what will they say?
This is a letter from a fish.
It’s a story of how we have used technology to push text itself forward, as an enormous splash of semantic ambiguity beyond the Boolean Logic, told through three of the absolute pioneers in Library and Information Science, Suzanne Briet, Henriette Avram and Elaine Svenonius.

Context > Prompts: Context Engineering Deep Dive

Lena Hall

Context Engineering is the art and science of designing and managing the information environment that enables AI systems to function reliably at scale. This technical session examines why focusing on prompt engineering alone leads to production failures and demonstrates how proper context management transforms unreliable AI into systems that actually work. We'll explore the fundamental difference between crafting instructions (prompts) and engineering what the model actually processes (context). You'll understand the four core activities of context engineering: persisting information outside the context window, selecting relevant information for each step, compressing histories to manage token limits, and isolating contexts to prevent interference. The session covers critical failure modes including context poisoning where errors compound over time, context distraction where historical information overwhelms reasoning, context confusion from irrelevant data, and context clash from contradictory information. We'll examine why these failures are inevitable without proper engineering and demonstrate specific techniques to prevent them. Through architectural patterns, we'll review context management in existing frameworks. You'll see how declarative approaches eliminate prompt string manipulation, how vector databases enable semantic memory, and how orchestration platforms coordinate context flow.

Observability, Evaluation, and Guardrails for Self-Optimizing Agents

David Hughes

As AI workflows and agents transition from experimentation to production, ensuring reliability, safety, and continuous optimization becomes crucial. Yet most projects begin with prompt engineering or model selection, without an observability and evaluation framework in place. This leads to brittle systems and missed opportunities for improvement. In this session, I'll explore how to build self-optimizing agents by integrating a monitoring framework for observability and evaluation with DSPy, a powerful framework for structured AI workflows. I'll cover why metrics matter, what to measure, and how evaluation outputs themselves can become the training data that drives optimization. You’ll see how real-time datasets generated from evaluations can be used to trigger optimization workflows. For example, when agent performance trends downward (e.g., task usefulness scores dropping below a threshold), prior high-scoring examples can be injected into DSPy workflows to optimize behavior in real time. I'll walk through a live demonstration: monitoring a DSPy workflow, observing metric trends, and triggering an optimization workflow when a guardrail is crossed. The session will close with a discussion of future directions for observability-first AI development.

LLMs Expand Computer Programs by Adding Judgment

Jonathan Mugan

Even if large language models (LLMs) stopped improving today, they still have yet to make their real impact on computer systems. This is because LLMs expand the capability of programs by adding judgment. Judgment allows computers to make decisions from fuzzy inputs and specifications, just like humans do. This talk will cover the ramifications of adding judgment to computer systems. We will start with simple judgment-enabled functions and then move to workflows. Adding judgment represents a fundamental shift in how programming is done because instead of writing computer code, system designers will be writing higher-level specifications. We will discuss how these specifications should be written to build robust systems given the sometimes-flakey judgment of LLMs.
We will also discuss how the judgment can be expanded by calling external tools. This tool-calling is a step toward agency, and the talk will also flesh out what agency really requires and the opportunities that it creates. Programs with agency can evaluate their own outputs and therefore improve them, leading to better outputs, which can then be improved. This virtuous cycle enables computer systems to begin to reflect biological ones, possibly even leading to conscious machines.

Data Governance Keynote
Existence over Essence? Data Governance in times of AI

Winfried Adalbert Etzel

Data governance lies at the heart of socio-technical systems. This includes the foundation for how organizations can integrate automation, AI, and AI agents. As organizations shift their focus in AI from innovation to adaptation, the challenge extends beyond traditional governance to include ethical, strategic, operational implications of autonomous, probabilistic systems. AI is no longer only a tool but a force that is changing the composition of socio-technical systems, reshaping human-machine interactions, accountability structures, and organizational decision-making.

Data governance must refocus on its core and embrace the roles it has to play:
1. Data Negotiation, aligning business, regulatory, technological demands with data requirements.
2. Data Direction, providing strategic orientation to ensure data and AI contribute to organizational purpose and values.
3. Data Audit, embedding accountability across human and machine decision-making.

The move from essence (data as static value) to existence (value through context and application) reframes governance as a force that is intentional structuring socio-technical systems. To operationalize AI at scale, organizations must unify data and AI governance, embedding transparency, fairness, and human oversight into adaptive feedback loops. Ultimately, governance is less about controlling and more about shaping organizational meaning, ensuring that AI amplifies human agency rather than eroding it.

Scar Tissue: Lessons from 20 Years of Building Ontologies and Knowledge Graphs

Juan Sequeda

Knowledge graphs and ontologies are now everywhere and having a moment. But for many of us, this work started way long before the hype.
I started working on ontologies and knowledge graphs (i.e. semantic web) in 2005, developing research-backed approaches to derive knowledge graphs from relational databases. Since then, I’ve taken that work beyond academia, into startups, products, and enterprises struggling to turn data into shared understanding.
This talk shares the scar tissue. This isn’t just a retrospective talk. It’s a practical guide grounded in experience, failure, and persistence.
I’ll cover the patterns I’ve seen repeat for 20 years: why efforts succeed and fail, what is still a challenge, why teams underestimate knowledge work and get stuck, how organizations can build semantic foundations without boiling the ocean, architectural approaches…and what I wish I had understood earlier technically, organizationally, and culturally.
If you’re building, buying, or betting on knowledge graphs today, this talk is a candid set of lessons learned from someone who’s been stubbornly working on the same core problem for 20 years, and is still learning. This talk may save you money and years of pain.

How to Hack An Agent in 10 Prompts: and other true stories from 2025

Matthew Sharp

There's never been a better time to be a hacker. With the explosion of vibe coded solutions full of vulnerabilities and the power and ease that LLMs and Agents lend to hackers we are seeing an increase in attacks. This talk dives into several vulnerabilities agent systems have introduced and how they are already being exploited.

2026 Trends: Building Foundations That Endure

Sanjeev Mohan

The data and AI landscape is shifting from experimentation to production at scale. This session examines the architectural and organizational trends reshaping how enterprises build lasting data foundations in 2026.
We'll explore critical dimensions: what's changing in the technology landscape, new approaches emerging in response and how to prepare your organization.
Drawing from recent analyst research and enterprise engagements across multiple industries, this talk provides guidance for data leaders navigating the transition from pilot projects to production AI systems. an understanding of which trends matter for your context and how to build foundations that can evolve with technological change rather than require replacement.

Rewriting SQLite in Rust. Here's What We've Learned.

Glauber Costa

SQLite is everywhere. SQLite is the most successful database ever built. Three billion installations, maybe more. It's on your phone, in your browser, running your message store.
It's also a synchronous, single-machine system designed when the web was young. No async I/O. No replication. No network protocol. This was fine—until developers started wanting to run it at the edge, sync it across devices, scale it to millions of databases.
In this session, Glauber Costa will show what it takes to make SQLite work for systems it was never meant to handle.
Not because SQLite is broken—it's not. But….
You can't bolt async onto a synchronous codebase.
You can't add memory safety to 150,000 lines of C without rewriting it.
And if you're going to rewrite it anyway, you might as well rethink the whole thing—which is what Glauber and his team at Turso have done. They are rewriting SQLite in Rust.
Glauber will share how they started with libSQL—a fork that's running production systems like Astro DB right now—and then dive into Turso: native async, deterministic simulation testing, the whole database reimagined for distributed systems.
You'll see how deterministic simulation testing finds bugs traditional testing misses. Why async matters even for local queries. What the actual technical tradeoffs are when you're trying to preserve compatibility while changing everything underneath. Where the edges are—what works, what doesn't, what they're still figuring out.

Agents are eating the semantic layer

Paul Blankley

When LLMs first hit the scene, the consensus was clear: you need a semantic layer for reliable, accurate results. The benchmarks proved it. The research confirmed it. I believed it too (and was one of the first voices saying so).
Then the models got better. And the consensus stopped being true.
Today, a semantic layer doesn't give your agent accuracy. It gives your agent a ceiling. It limits flexibility, constrains the questions your agent can answer, and forces you to anticipate every question in advance. That's not how data in enterprises actually works.
This talk covers where the semantic layer falls short, what we built instead, and how to architect agents that gather and create business context dynamically, without sacrificing governance or trust.

DataOps Is Culture, Not a Toolchain

Matthew Mullins

DataOps is often framed as a collection of tools. In practice it is a culture and a set of engineering behaviors adapted from software and platform teams to the realities of data work. This talk explores the cultural foundations of DataOps, including continuous improvement, learning from failure, blameless retrospectives, and measurement. We will explore the difference between DataOps and DevOps, then define what good measurement looks like for data teams. We will map DataOps outcomes to DORA while also drawing from SPACE and DevEx to capture satisfaction, collaboration, cognitive load, and flow. You will leave with concrete rituals, metrics, and anti-patterns to watch for.

Your Skills, Your Business: Layoff-Proof Your Career through Solopreneurship

Clair Sullivan

The tech industry has taught us a hard lesson: no job is guaranteed. But here's the good news: as a data scientist, you already have everything you need to take control of your career and make it layoff proof.
In this talk, we'll explore how to transform your data expertise into a solopreneur business. You'll get an overview of the solopreneurship landscape, including the different paths available to data scientists and engineers: consulting, freelancing, fractional employment, creating digital products, and more. We'll cover how to start thinking about the financial viability of this model, how to create the actual business, where to find your first clients, and how you can even take those initial steps while still employed. By the end you will walk away with a clearer picture of what's possible and practical first steps to start exploring your options today.

The Human Layer of Data: Why Trust Lives in the Work Behind the Numbers

Thais Cooke

In data teams, trust is the real currency, and it starts long before a single line of code is written. When teams build it with intention, everything from collaboration to decision-making becomes stronger. This talk explores the human side of data work: the conversations, shared definitions, and real-world context that bring meaning to our metrics. We’ll look at how trust grows when data is not just accurate, but understood and aligned across teams. Attendees will leave with a clearer sense of how trust takes shape in everyday analytics work, and why it matters long before the numbers reach a dashboard.
Attendees should have a basic familiarity with working in data teams and an understanding of how data is used to inform decisions. Experience collaborating across roles or interacting with analytics, dashboards, or reports is helpful.
This session is completely tool-agnostic. It focuses on the human side of data work, so the concepts apply across any analytics environment, regardless of software or platform.
Attendees will leave with a clear understanding that trust is the foundation of reliable analytics and that many data challenges are human, not technical. They will learn how miscommunication, misaligned definitions, and lack of context can create the biggest obstacles, and why investing in this critical pre-work of conversations, shared definitions, and grounding metrics in a real-world context sets the stage long before a single line of code is written. The session also highlights how strengthening the human layer improves decision-making, cross-team collaboration, and organizational impact.

Data Visualization Keynote
"Who Needs a Chart When You Can Just Chat?" - the role of data visualization in a post-LLM world

Christian Miles

Data visualisation faces an existential question: in a world where users can simply ask questions of their data in natural language, what role remains for visual representation? In this talk, graph visualisation practitioner Christian Miles explores this tension.
While LLMs reshape how people access data, visualisation remains essential – charts surface outliers, gaps, and distributions without requiring you to ask the right question.
Christian will examine where AI advances visualisation practice (accelerated prototyping, simulated user testing, democratised tooling) while calling out current dead ends. Graph visualisation gets particular focus – a foundational model for understanding the connected world, with LLMs enabling new interactive experiences.
Christian will then go on to examine visualisation's role in AI interpretability research and concludes with a jobs-to-be-done reframe: which jobs do charts do that LLMs structurally cannot absorb?

Iceberg for Agents - Elevating Lakehouse Data Into AI-Ready Context

Andrew Madson

AI agents fail in production because even though they're stuffed with data, they're starved for context. Better LLM models aren’t the problem. The bottleneck is the data stack: fragmented silos, inconsistent definitions, and logic hidden in tribal knowledge. Agents need structured, reliable, and interpretable context—not just data access. In this session, Andrew will show how Apache Iceberg becomes the backbone of AI-ready pipelines. You’ll learn how to elevate your Iceberg implementation from a storage format to a live context layer that powers structured retrieval-augmented generation (RAG), schema-aware agents, and autonomous reasoning grounded in truth.
The session will cover:
1. Iceberg Foundations for AI - from ACID to Time Travel
2. From Rows to Relationships - The role of the semantic layer
3. Structured RAG in Practice - Fully open source
The session will also include a live demo of a fully open-source Structured RAG stack built on Apache Iceberg, featuring semantic query translation, hybrid retrieval, and governed agent reasoning. Expect architecture diagrams, real code, and practical guidance.

Hands-on Data Product: let's build a data product in 30 minutes (hands-on workshop)

Jean-Georges Perrin

Everything’s bigger in Texas, including the expectations we put on data. Yet for many teams, “data products” remain a buzzword rather than something concrete they can build, ship, and trust.
In this fast-paced, hands-on session at Data Day Texas, we will build a real data product from scratch in just 30 minutes. Expect a maximum of five slides to set the scene, then we quickly switch to hands-on work with real code. This is a practical session, not a theory talk.
Starting from a simple business use case, we will define what makes a data product valuable, design its data contract, describe its interface, and make it ready for consumption. You will see how open standards such as ODCS and ODPS help turn implicit assumptions into explicit, testable artifacts that teams can rely on.
Think of it like a Texas BBQ recipe for data: clear ingredients, clear ownership, and a result people can actually use.
You will leave with a concrete mental model, reusable patterns, and the confidence that building data products does not require a six-month transformation program, just the right mindset and a few solid building blocks.

Data Day Texas 2026 is made possible by the generosity of our Patrons and Partners.

These organizations support the data community by making Data Day Texas accessible to practitioners at all career stages.