The Data Day Texas 2026 Sessions

We are continuing to publish the sessions info.

The Joy of SQL - If Properly Implemented

Hannes Mühleisen

Much has been said about the imminent demise of the Structured Query Language. Maybe partly due to all its expressive power, SQL is supposedly user-hostile, hard to compose, or simply too old. For better or worse, there are also multiple interpretations of what SQL should be: Every database vendor has its own dialect with its own idiosyncrasies.
However, the alleged clunkyness of SQL is merely a byproduct of an aversion to change in the industry, along with the challenges of making changes to massive ancient code bases. There is no fundamental need for your SQL experience to be clunky.
In this session, Hannes will describe DuckDB’s approach to query languages. Starting with SQL, DuckDB has removed a lot of needless hurdles, greatly increasing the capabilities and forgiveness of the language. DuckDB has also recently overhauled its command line interface, soon-to-be-released. The result is a dialect that hopefully instills joy into data practitioners with a fresh look at SQL And if you insist, DuckDB even allows you to bring your own language.

AI for Business Analytics: the Time is Now

Ryan Boyd

LLMs excel at natural language understanding but struggle with factual accuracy when aggregating business data. Yet the infrastructure to bridge this gap finally exists. This talk explores the architectural patterns that make AI agents reliable for business analytics—from semantic modeling and the Model Context Protocol (MCP) to hypertenancy architectures that prevent runaway costs and resource collisions. Through practical examples, we'll examine how isolated compute and modern integrations with custom tooling are making AI-powered analytics accessible and trustworthy for organizations ready to move beyond dashboards.

The Enterprise of the Future Runs on Ontologies: Making AI Agents Actually Work

Adriano Vlad-Starrabba

Enterprise AI is shifting from generative creativity to active, agentic execution. AI will be measured not by what it can generate, but by how reliably it can operate. The problem? Most enterprise data is not ready. It is fragmented across databases, warehouses and platforms, with zero context agents can actually use.
Ontologies provide the missing operational context by connecting the key concepts in your organization and the rules that govern them. Yet, using ontologies has traditionally been almost impossible, requiring GraphDBs, costly migrations, OWL/RDF models, expensive consulting, and black box solutions.
This session demonstrates how to make ontologies actually usable by connecting business concepts to your data anywhere it lives, unlocking data processing across any source without migrations, complex pipelines, or data duplication.
Through practical finance use cases, you'll see:

• Detecting fraud networks with advanced graph analytics, without a GraphDB
• Turning manual Excel workflows into automated, deterministic processing
• Powering auditable, deterministic agentic workflows with full lineage

Learn how ontologies can become executable and operational, enabling data processing across any source to automate human and AI workflows and power the next generation of agentic applications.

see also #ontologies #agents

Generative AI and Business Value: Why Corporate Deployment Falls Short

Bill Inmon

Generative AI has captured the world's imagination, yet delivering measurable business value in corporate environments remains elusive. While LLMs excel at answering "notional queries"—trivia and general knowledge questions—they struggle with the structured data queries central to business operations.
This session examines the fundamental mismatch between how generative AI processes information and what corporate data environments actually require. Drawing on decades of enterprise data architecture experience, Bill Inmon identifies why LLMs fail at business queries: hallucinations without accountability, inability to provide lineage, poor handling of structured data formats, and the overwhelming volume of unfocused source documents.
The talk introduces the concept of LLM preprocessing—an architectural pattern that filters, validates, and contextualizes data before it reaches the language model. You'll learn how preprocessing can reduce document volume by orders of magnitude, add reliability scoring to outputs, provide lineage tracking, and focus LLMs on domain-specific contexts rather than general knowledge.
Key takeaways include understanding where generative AI naturally succeeds versus where it requires architectural support, recognizing the gap between text-based processing and structured business data requirements, and exploring preprocessing patterns that bridge the divide between LLM capabilities and enterprise needs.

see also #production-ai #business

What Apache Iceberg is Bad At …. For Now

Russell Spitzer

Apache Iceberg is rapidly becoming the Open Source Table standard, but popularity doesn't mean everyone is using it efficiently. Like all technologies, Apache Iceberg has some strong features, ACID Transactions, Metric Pushdown, and Hidden Partitioning, but it also has antipatterns that will lead to a heap of trouble downstream. In this talk we'll go over streaming and bloated metadata, poor query performance because of inefficient metrics, and missing maintenance bogging down query planning. You'll learn why these issues plague users of Iceberg tables and how to avoid them. Finally we'll also go through how some of these problems won't even be an issue with some of the new proposals like "Single File Commits" and "V4 Column Metrics" for Iceberg Spec V4. Come and learn to avoid pitfalls like a pro and get a preview of what's coming next.

see also #lakehouse

How Ideas Become Books: Inside O'Reilly and Data Thought Leadership

Aaron Black

Every technical field has its unofficial rulebook. That book people cite again and again. But how do those books get chosen, shaped, and launched? In this session, Aaron Black of O'Reilly Media takes you inside the publishing process that turns individual expertise into industry knowledge. You'll discover the full lifecycle of a book, from the first proposal through writing, review, production, publication, and beyond. You will also learn what editors look for when evaluating new ideas, including how to articulate your authority, define what readers will gain, estimate the real size of an audience, and structure even the most technical material as a narrative.
Whether you are curious about publishing or simply want to understand how ideas become infrastructure, this talk offers a rare look behind the curtain.
Aaron will also discuss what publishers actually look for when evaluating new ideas • a practical framework for crafting a compelling book proposal • how a technical book moves from idea to publication • how audience size, narrative, and expertise shape what gets published.

see also #career

Building G.V(): Why Graph Databases Desperately Need Better Tools

Arthur Bigeard

For decades, relational databases set the bar: rich IDEs, data modeling tools, query optimization, standardized languages. The ecosystem matured because developers demanded it. Graph databases promised a revolution - but shipped without the tooling. No proper debuggers. No schema visualization. No way to step through a complex Gremlin traversal to see where it breaks. Non-technical users stuck writing queries or waiting on data teams.
This session tells the story of G.V() - built out of a developer's frustration after deploying JanusGraph and realizing there was no equivalent to the tools we take for granted in the relational world. What started as a passion project for Arthur Bigeard to solve personal pain points became the most widely-compatible graph database client on the market - now officially recommended by Amazon Neptune, bundled with Aerospike Graph, and supporting Neo4j, JanusGraph, Spanner Graph, Dgraph, Kuzu, PuppyGraph, and more.
In this session, Arthur Bigeard of G.V() will cover:
• Why graph databases launched without proper tooling (and why that needs to change)
• What actually makes a good graph IDE (schema-aware autocomplete, step-by-step Gremlin debugging, no-code exploration)
• The architectural decisions that let G.V() work universally across vendors
• Why running locally matters (your data stays in your network)
• What's next for graph database tooling as the ecosystem matures

see also #graph-databases

Local AI Saves People (Not Clickbait)

Chris Brousseau

2025-2026 is the decisive window for transitioning Large Language Model operations from cloud-based services to on-premise deployments. Through analysis of military AI integration, environmental impact data, performance benchmarks, and economic factors, I'll demonstrate that on-premise AI solutions now match or exceed cloud deployments in speed and quality while offering superior privacy, security, and cost predictability. This is the last chance for organizations to establish sovereign AI infrastructure before dependencies and regulations make transition prohibitively difficult.

see also #production-ai

The $1M Data Professional

Shachar Meir

We're living a time of uncertainty and tectonic changes, in our industry and the way we do business. From how we work, to macro-economics and politics, unstable job market, and evolving technological landscape.
These situations present lots of risks and threats. But also massive opportunities. For some certain people, with the right mindset, this period can be the opportunity of their lifetime. I call them The $1M Data Professionals.
Learn how to navigate these challenging times, how to create more impact, and how to elevate your career.

see also #career

Brokk: Context Engineering for Large Codebases

Jonathan Ellis

Context engineering for software development requires a different paradigm than the 40 year old IDE design optimized for reading and writing code at human speeds. In this session, Jonathan Ellis will discuss the strengths and weaknesses of current state-of-the-art LLMs for planning and writing code - and how they led to the development of Brokk - a new tool for supervising code-writing AI assistants effectively in million-line codebases. Jonathan will cover the principles behind Brokk's design, how Brokk leverages static analysis to help LLMs understand your code at a level deeper than raw text, and how it helps you not just generate code but review it effectively as well.

see also #context #software

Stop Guessing, Start Measuring: A Decade of Database Experimentation and Tuning

Jon Haddad

Most database performance tuning starts with assumptions and ends with hope. I took a different approach: build disposable lab environments, run systematic experiments, and capture what actually works. The result is a feedback system that has helped teams cut infrastructure costs by tens of millions of dollars. I'll walk through the technical architecture, share hard-won lessons from almost ten years of development, and demonstrate how continuous experimentation beats tribal knowledge every time.

see also #systems

The Soul of the Machine: How Your Business Philosophy is the Ultimate AI Differentiator

Jordan Morrow

In the race to adopt AI, many organizations focus on the "what"—the models, the data, the platforms. But the real competitive advantage lies in the "why"—the core business philosophy that guides your company. This session will provide a practical framework for translating your organization's unique values, mission, and strategic goals into tangible AI-driven value.
We will explore how to:
Deconstruct your business philosophy and discuss implementing it into your AI strategy.
Move beyond generic AI use cases and identify unique opportunities that align with your company's DNA.
Foster a culture of "philosophy-first" AI development, where data scientists, engineers, and business leaders work together to build solutions that are not only technologically advanced but also deeply aligned with your organization's purpose.
Measure the ROI of "philosophy-driven" AI, demonstrating how a focus on your "why" can lead to a more sustainable and impactful AI transformation.
This session is for professionals who want to move beyond the technical and become true strategic partners in their organizations. You will leave with a new perspective on AI's potential and a practical toolkit for building AI that is not only intelligent but also wise.

see also #business

Code: The Untapped Metadata Source Driving Most Data Failures

Mark Freeman II

Over the past three years, collaborating with organizations implementing data contracts and writing the O'Reilly book on the topic, it has become abundantly clear that data is surprisingly the wrong surface to address systemic data issues —we actually need to focus on code. Code not only defines what data is captured and moved through software systems, but it's also the ultimate source of truth in how a business operates and its documented changes over time. This is evident in how some of the most significant internet outages can be traced not only to code changes, but also to how those changes affected underlying data models. In this talk, I share case studies and lessons learned in extracting metadata from code and using it to enforce expectations across software systems and data products.

see also #ontologies #software #data-products

How to gather data requirements in 30 minutes or less - the Information Product Canvas

Shane Gibson

Ever struggled with lengthy, boring and confusing data requirement workshops that leave both your Stakeholders and your Data Team uncertain about what is needed and the next steps?
Imagine a Pattern Storming Workshop and Pattern Template that help gather clear, actionable data requirements in just 30 minutes, and then immediately being able to leverage those requirements to kick off your Data Design process.
Shane Gibson, author of An Agile Data Guide to Information Product Canvas, walks you through a fast paced overview of the 12 key areas of the Canvas, explaining exactly what to capture, and why each area of the Canvas matters.
In the second half of the session, Shane will dive into how to effectively use the requirements captured in the Canvas to kick-start your Data Design process.
You'll discover how the Information Product Canvas maps directly into organisational actions and outcomes, reducing misunderstanding, speeding up delivery, and maximizing business value.

You'll leave this session knowing exactly how to:
* Rapidly capture and clarify business-focused data requirements
* Use the Information Product Canvas to seamlessly transition from requirements to Data Design
* Accelerate the delivery of Data Assets and Information Products by aligning business stakeholders and technical data teams

see also #data-products

AI wants to do your data engineering work for you. Should you let it?

Ciro Greco

AI systems are increasingly capable of assisting with analytics work: generating SQL, suggesting transformations, diagnosing failures, and proposing changes to pipelines. In practice, however, these systems rarely operate directly on production analytics data. The limiting factor is not model capability, but the fact that most analytics platforms were never designed to tolerate frequent, automated, or partially trusted writes.
Modern analytics stacks assume slow, human-driven change, read-heavy workloads, and implicit trust in whoever runs a job. As a result, even modest automation around analytics pipelines quickly runs into hard limits: partial writes become visible, backfills are risky and expensive, failures corrupt shared tables, and reproducing or undoing changes is often impractical.
This talk argues that analytics systems need stronger semantics before AI systems can safely move from "assistants" to "operators." When automated or semi-automated processes interact with analytics data, familiar failure modes stop being edge cases and become systemic. Guardrails implemented as conventions, reviews, or best practices are not sufficient.
We examine what it means to design analytics systems that are safe for AI-driven workflows. Drawing from software engineering and database theory, we focus on a small set of foundational ideas: isolation by default, atomic execution across analytics outputs, explicit publish steps, and versioned, reproducible state.

see also #agents #production-ai

What is really 'Open' in an Open Lakehouse Architecture? (ft. Apache Iceberg, Hudi & Delta Lake)

Dipankar Mazumdar

There is a lot of buzz around the lakehouse architecture today, which unifies two mainstream data architectures - data warehouse & data lakes - promising to do more with less. On the other hand, major data warehouse vendors have embraced the use of open table formats, due to demand for the flexibility & openness promised by supporting an open format.
Projects such as Apache Iceberg, Apache Hudi, and Delta Lake have been at the center of this shift. In addition, newer table formats like DuckLake and Lance are emerging. Together, these efforts have helped establish an open and adaptable foundation for data, enabling enterprises to choose compute engines based on their workload needs rather than being locked into proprietary storage formats.
However, as terms like "open table format" and "open data lakehouse" are often used interchangeably, there is a growing need for clarity and a deeper technical understanding of what openness actually means in this context.
In this session, we will do a technical breakdown of the lakehouse architecture & and examine what truly brings openness to a data platform.

see also #lakehouse

Learning Beyond Language: A New Geometric Paradigm for Better ML

Alexandra Pasi

"Bigger is better" has been a dominating philosophy in machine learning throughout the prior decade. This principle has manifested itself in a variety of ways: in the promise that bigger datasets would deliver better model performance, and in the focus on increasingly complex model-ensembling techniques. These directions have produced many impressive results in specific contexts and benchmarks, while failing to produce innovations which address the fundamental question of model generalizability outside the training set. This approach has been associated with a number of ubiquitous practical challenges to deploying models in production, including frequent retrainings, slow model inference, and large/cumbersome models. New foundational approaches to the mathematical underpinnings of machine learning can improve model accuracy in production, as well as improving the compactness and speed of machine learning models. These novel approaches involve finding new ways of representing and leveraging the inherent geometry of the data.

see also #better-ml

Meta Grid 2026!

Ole Olesen-Bagneux

In an enterprise setting, it is pointless to create metadata from scratch - it is even counterproductive.
Metadata cannot be based on logical assumptions only. Instead, metadata management needs to be empirical. Implementations of new technologies for metadata management, however carefully crafted, will eventually fail, if the empirical reality of the enterprise metadata is not taken into account. That is what the Meta Grid is all about and this presentation highlights how, with perspectives from the newly published book Fundamentals of Metadata Management (O'Reilly, 2025).

see also #ontologies #librarians

Designing an Apache Iceberg Lakehouse: From Requirements to a Stakeholder-Ready Architecture

Alex Merced

Building an effective Apache Iceberg lakehouse starts well before technology selection. It begins with internal research that clarifies business goals, data domains, access patterns, and stakeholder expectations. This talk presents a structured approach to lakehouse planning, starting with requirements discovery and moving through each architectural layer with intent.
The session walks through how to evaluate and design the storage layer, ingestion patterns, catalog and governance services, query federation capabilities, and consumption interfaces. For each layer, the discussion focuses on key trade-offs, interoperability considerations, and how open standards like Apache Iceberg reduce lock-in while enabling flexibility.
Attendees will leave with a practical framework for mapping stakeholder needs to concrete architectural decisions. The goal is to help data engineers, architects, and technical leaders design an Iceberg lakehouse that is scalable, governable, and aligned with both analytical and operational use cases.

see also #lakehouse

The Engineer's Guide to AI Strategy: Bridging the Gap Between Business and Technical Reality

Kierra Dotson

The industry is littered with AI "Proof of Concepts" that never reached production and production deployments that became overnight disasters. Why? Because we have traditionally treated Strategy and Governance as "soft" business functions—the domain of slide decks and legal checklists—while treating Engineering as a mere execution arm. In this session, we will flip the script. We'll argue that for AI to be viable, engineers must treat Strategy and Governance as both hard technical requirements and soft strategic objectives—foundational elements to engineering a sound AI product. However, bridging this gap requires a fundamental shift in the engineering persona. We must move from being "executioners" to "strategy setters," which requires mastering the intersection where engineering strategy informs and aligns with organizational objectives.

We will explore how to build this alignment by:
- Grounding the Vision: Ensuring top-down business goals are anchored in technical feasibility, resource constraints, and data reality.
- Codifying Governance: Moving beyond policy documents to building automated guardrails, adversarial testing, and technical safety directly into the CI/CD pipeline.
- The Strategic Engineer: Developing the communication style and technical foresight required to advise leadership not just on how to build a feature, but whether it should be built at all.

see also #production-ai #business

Context > Prompts: Context Engineering Deep Dive

Lena Hall

Context Engineering is the art and science of designing and managing the information environment that enables AI systems to function reliably at scale. This technical session examines why focusing on prompt engineering alone leads to production failures and demonstrates how proper context management transforms unreliable AI into systems that actually work. We'll explore the fundamental difference between crafting instructions (prompts) and engineering what the model actually processes (context). You'll understand the four core activities of context engineering: persisting information outside the context window, selecting relevant information for each step, compressing histories to manage token limits, and isolating contexts to prevent interference. The session covers critical failure modes including context poisoning where errors compound over time, context distraction where historical information overwhelms reasoning, context confusion from irrelevant data, and context clash from contradictory information. We'll examine why these failures are inevitable without proper engineering and demonstrate specific techniques to prevent them. Through architectural patterns, we'll review context management in existing frameworks. You'll see how declarative approaches eliminate prompt string manipulation, how vector databases enable semantic memory, and how orchestration platforms coordinate context flow.

see also #context

LLMs Expand Computer Programs by Adding Judgment

Jonathan Mugan

Even if large language models (LLMs) stopped improving today, they still have yet to make their real impact on computer systems. This is because LLMs expand the capability of programs by adding judgment. Judgment allows computers to make decisions from fuzzy inputs and specifications, just like humans do. This talk will cover the ramifications of adding judgment to computer systems. We will start with simple judgment-enabled functions and then move to workflows. Adding judgment represents a fundamental shift in how programming is done because instead of writing computer code, system designers will be writing higher-level specifications. We will discuss how these specifications should be written to build robust systems given the sometimes-flakey judgment of LLMs.
We will also discuss how the judgment can be expanded by calling external tools. This tool-calling is a step toward agency, and the talk will also flesh out what agency really requires and the opportunities that it creates. Programs with agency can evaluate their own outputs and therefore improve them, leading to better outputs, which can then be improved. This virtuous cycle enables computer systems to begin to reflect biological ones, possibly even leading to conscious machines.

see also #agents

Data Governance Keynote: Existence over Essence? Data Governance in times of AI

Winfried Adalbert Etzel

Data governance lies at the heart of socio-technical systems. This includes the foundation for how organizations can integrate automation, AI, and AI agents. As organizations shift their focus in AI from innovation to adaptation, the challenge extends beyond traditional governance to include ethical, strategic, operational implications of autonomous, probabilistic systems. AI is no longer only a tool but a force that is changing the composition of socio-technical systems, reshaping human-machine interactions, accountability structures, and organizational decision-making.

Data governance must refocus on its core and embrace the roles it has to play:
1. Data Negotiation, aligning business, regulatory, technological demands with data requirements.
2. Data Direction, providing strategic orientation to ensure data and AI contribute to organizational purpose and values.
3. Data Audit, embedding accountability across human and machine decision-making.

The move from essence (data as static value) to existence (value through context and application) reframes governance as a force that is intentional structuring socio-technical systems. To operationalize AI at scale, organizations must unify data and AI governance, embedding transparency, fairness, and human oversight into adaptive feedback loops. Ultimately, governance is less about controlling and more about shaping organizational meaning, ensuring that AI amplifies human agency rather than eroding it.

see also #governance

Scar Tissue: Lessons from 20 Years of Building Ontologies and Knowledge Graphs

Juan Sequeda

Knowledge graphs and ontologies are now everywhere and having a moment. But for many of us, this work started way long before the hype.
I started working on ontologies and knowledge graphs (i.e. semantic web) in 2005, developing research-backed approaches to derive knowledge graphs from relational databases. Since then, I've taken that work beyond academia, into startups, products, and enterprises struggling to turn data into shared understanding.
This talk shares the scar tissue. This isn't just a retrospective talk. It's a practical guide grounded in experience, failure, and persistence.
I'll cover the patterns I've seen repeat for 20 years: why efforts succeed and fail, what is still a challenge, why teams underestimate knowledge work and get stuck, how organizations can build semantic foundations without boiling the ocean, architectural approaches…and what I wish I had understood earlier technically, organizationally, and culturally.
If you're building, buying, or betting on knowledge graphs today, this talk is a candid set of lessons learned from someone who's been stubbornly working on the same core problem for 20 years, and is still learning. This talk may save you money and years of pain.

see also #ontologies #graph-databases

How to Hack An Agent in 10 Prompts: and other true stories from 2025

Matthew Sharp

There's never been a better time to be a hacker. With the explosion of vibe coded solutions full of vulnerabilities and the power and ease that LLMs and Agents lend to hackers we are seeing an increase in attacks. This talk dives into several vulnerabilities agent systems have introduced and how they are already being exploited.

see also #security #agents

2026 Trends: Building Foundations That Endure

Sanjeev Mohan

The data and AI landscape is shifting from experimentation to production at scale. This session examines the architectural and organizational trends reshaping how enterprises build lasting data foundations in 2026.
We'll explore critical dimensions: what's changing in the technology landscape, new approaches emerging in response and how to prepare your organization.
Drawing from recent analyst research and enterprise engagements across multiple industries, this talk provides guidance for data leaders navigating the transition from pilot projects to production AI systems. an understanding of which trends matter for your context and how to build foundations that can evolve with technological change rather than require replacement.

see also #business

Rewriting SQLite in Rust. Here's What We've Learned.

Glauber Costa

SQLite is everywhere. SQLite is the most successful database ever built. Three billion installations, maybe more. It's on your phone, in your browser, running your message store.
It's also a synchronous, single-machine system designed when the web was young. No async I/O. No replication. No network protocol. This was fine—until developers started wanting to run it at the edge, sync it across devices, scale it to millions of databases.
In this session, Glauber Costa will show what it takes to make SQLite work for systems it was never meant to handle.
Not because SQLite is broken—it's not. But….
You can't bolt async onto a synchronous codebase.
You can't add memory safety to 150,000 lines of C without rewriting it.
And if you're going to rewrite it anyway, you might as well rethink the whole thing—which is what Glauber and his team at Turso have done. They are rewriting SQLite in Rust.
Glauber will share how they started with libSQL—a fork that's running production systems like Astro DB right now—and then dive into Turso: native async, deterministic simulation testing, the whole database reimagined for distributed systems.
You'll see how deterministic simulation testing finds bugs traditional testing misses. Why async matters even for local queries. What the actual technical tradeoffs are when you're trying to preserve compatibility while changing everything underneath. Where the edges are—what works, what doesn't, what they're still figuring out.

see also #systems

Agents are eating the semantic layer

Paul Blankley

When LLMs first hit the scene, the consensus was clear: you need a semantic layer for reliable, accurate results. The benchmarks proved it. The research confirmed it. I believed it too (and was one of the first voices saying so).
Then the models got better. And the consensus stopped being true.
Today, a semantic layer doesn't give your agent accuracy. It gives your agent a ceiling. It limits flexibility, constrains the questions your agent can answer, and forces you to anticipate every question in advance. That's not how data in enterprises actually works.
This talk covers where the semantic layer falls short, what we built instead, and how to architect agents that gather and create business context dynamically, without sacrificing governance or trust.

see also #agents #ontologies

DataOps Is Culture, Not a Toolchain

Matthew Mullins

DataOps is often framed as a collection of tools. In practice it is a culture and a set of engineering behaviors adapted from software and platform teams to the realities of data work. This talk explores the cultural foundations of DataOps, including continuous improvement, learning from failure, blameless retrospectives, and measurement. We will explore the difference between DataOps and DevOps, then define what good measurement looks like for data teams. We will map DataOps outcomes to DORA while also drawing from SPACE and DevEx to capture satisfaction, collaboration, cognitive load, and flow. You will leave with concrete rituals, metrics, and anti-patterns to watch for.

see also #dataops #teams

Your Skills, Your Business: Layoff-Proof Your Career through Solopreneurship

Clair Sullivan

The tech industry has taught us a hard lesson: no job is guaranteed. But here's the good news: as a data scientist, you already have everything you need to take control of your career and make it layoff proof.
In this talk, we'll explore how to transform your data expertise into a solopreneur business. You'll get an overview of the solopreneurship landscape, including the different paths available to data scientists and engineers: consulting, freelancing, fractional employment, creating digital products, and more. We'll cover how to start thinking about the financial viability of this model, how to create the actual business, where to find your first clients, and how you can even take those initial steps while still employed. By the end you will walk away with a clearer picture of what's possible and practical first steps to start exploring your options today.

see also #career

The Human Layer of Data: Why Trust Lives in the Work Behind the Numbers

Thais Cooke

In data teams, trust is the real currency, and it starts long before a single line of code is written. When teams build it with intention, everything from collaboration to decision-making becomes stronger. This talk explores the human side of data work: the conversations, shared definitions, and real-world context that bring meaning to our metrics. We'll look at how trust grows when data is not just accurate, but understood and aligned across teams. Attendees will leave with a clearer sense of how trust takes shape in everyday analytics work, and why it matters long before the numbers reach a dashboard.
Attendees should have a basic familiarity with working in data teams and an understanding of how data is used to inform decisions. Experience collaborating across roles or interacting with analytics, dashboards, or reports is helpful.
This session is completely tool-agnostic. It focuses on the human side of data work, so the concepts apply across any analytics environment, regardless of software or platform.
Attendees will leave with a clear understanding that trust is the foundation of reliable analytics and that many data challenges are human, not technical. They will learn how miscommunication, misaligned definitions, and lack of context can create the biggest obstacles, and why investing in this critical pre-work of conversations, shared definitions, and grounding metrics in a real-world context sets the stage long before a single line of code is written. The session also highlights how strengthening the human layer improves decision-making, cross-team collaboration, and organizational impact.

see also #teams

Data Visualization Keynote
"Who Needs a Chart When You Can Just Chat?" - the role of data visualization in a post-LLM world

Christian Miles

Data visualisation faces an existential question: in a world where users can simply ask questions of their data in natural language, what role remains for visual representation? In this talk, graph visualisation practitioner Christian Miles explores this tension.
While LLMs reshape how people access data, visualisation remains essential – charts surface outliers, gaps, and distributions without requiring you to ask the right question.
Christian will examine where AI advances visualisation practice (accelerated prototyping, simulated user testing, democratised tooling) while calling out current dead ends. Graph visualisation gets particular focus – a foundational model for understanding the connected world, with LLMs enabling new interactive experiences.
Christian will then go on to examine visualisation's role in AI interpretability research and concludes with a jobs-to-be-done reframe: which jobs do charts do that LLMs structurally cannot absorb?

see also #graph-databases #data-viz

Think Like a Librarian: Fresh Perspectives for Data Teams from a Time-Honored Tradition

Jenna Jordan
Amalia Child

The path to becoming a data professional can be winding, often informal, and wildly diverse. The diversity of experiences and degrees on a data team is one of its greatest strengths, but it also means that many of us miss the chance to learn the fundamentals of organizing and delivering information. While new degree programs for analytics, data science, and data management have begun to emerge, we can look to another - much older - field for a collection of mature frameworks and grounding principles: library & information science (LIS). In this session, we will teach you how to think like a librarian by exploring three core practices from LIS that have endured through decades of advancement in technology, including AI: forming a collection development strategy, conducting a reference interview, and evaluating our services. How will you apply these mental models to level up your data team's strategy and practices?

see also #librarians #ontologies #teams

Beyond the Tech Stack: Navigating the Complete Data Ecosystem

Dylan Anderson

The data industry has become nearly impossible to navigate. We have more tools, frameworks, and methodologies than ever before. Yet, the average company is still continuing to fail at data transformation, with AI being the latest victim.
Why? Because we, as data professionals, can no longer see the forest for the trees. We're so focused on our individual domains that we've lost sight of the holistic picture. Meanwhile, business stakeholders feel increasingly lost in translation, and no single 'unicorn' hire can magically unite these fragmented worlds.
This talk maps the complete Data Ecosystem through a practical lens. We'll start with real pain points gathered from interviews with dozens of data leaders and patterns observed across numerous client engagements. Then we'll explore how these challenges manifest across the Data Ecosystem and how to actionably address them.
Finally, how do we all pull it together? Everybody has their area of expertise in data, but to succeed in today's data & AI environment, we need to expand that expertise and appreciate the interconnected nature of our domain.
Whether you're a CDO trying to align disparate teams, an engineer wondering how to get ahead of the endless ticket queue, or an analyst frustrated by perpetual data quality issues, this talk will provides an initial approach on how to think about and navigate the increasingly complex Data Ecosystem.
This isn't just data strategy, it's data cartography.

see also #governance #teams

Iceberg for Agents - Elevating Lakehouse Data Into AI-Ready Context

Andrew Madson

AI agents fail in production because even though they're stuffed with data, they're starved for context. Better LLM models aren't the problem. The bottleneck is the data stack: fragmented silos, inconsistent definitions, and logic hidden in tribal knowledge. Agents need structured, reliable, and interpretable context—not just data access. In this session, Andrew will show how Apache Iceberg becomes the backbone of AI-ready pipelines. You'll learn how to elevate your Iceberg implementation from a storage format to a live context layer that powers structured retrieval-augmented generation (RAG), schema-aware agents, and autonomous reasoning grounded in truth.
The session will cover:
1. Iceberg Foundations for AI - from ACID to Time Travel
2. From Rows to Relationships - The role of the semantic layer
3. Structured RAG in Practice - Fully open source
The session will also include a live demo of a fully open-source Structured RAG stack built on Apache Iceberg, featuring semantic query translation, hybrid retrieval, and governed agent reasoning. Expect architecture diagrams, real code, and practical guidance.

see also #lakehouse #agents

Hands-on Data Product: let's build a data product in 30 minutes (hands-on workshop)

Jean-Georges Perrin

Everything's bigger in Texas, including the expectations we put on data. Yet for many teams, "data products" remain a buzzword rather than something concrete they can build, ship, and trust.
In this fast-paced, hands-on session at Data Day Texas, we will build a real data product from scratch in just 30 minutes. Expect a maximum of five slides to set the scene, then we quickly switch to hands-on work with real code. This is a practical session, not a theory talk.
Starting from a simple business use case, we will define what makes a data product valuable, design its data contract, describe its interface, and make it ready for consumption. You will see how open standards such as ODCS and ODPS help turn implicit assumptions into explicit, testable artifacts that teams can rely on.
Think of it like a Texas BBQ recipe for data: clear ingredients, clear ownership, and a result people can actually use.
You will leave with a concrete mental model, reusable patterns, and the confidence that building data products does not require a six-month transformation program, just the right mindset and a few solid building blocks.

see also #data-products #dataops #governance

Data Day Texas 2026 is made possible by the generosity of our Patrons and Partners.

These organizations support the data community by making Data Day Texas accessible to practitioners at all career stages.