O'Reilly Author Showcase at Data Day

O'Reilly authors have been a part of Data Day Texas since the very beginning. Some of those who've presented over the years are: Gwen Shapira, Holden Karau, Jay Kreps, Emil Efrim, WesMcKinney, Sandy Ryza, Russell Jurney, Ted Dunning, Eric Sammer, Josh Wills, Julia Silge, Sean Owen, Amy Hodler, Andy Petrella, Ryan Mitchell, Joey Echeverria, Denise Gosnell, Matthias Broecheler, Matthew Russell, Eric Lubow, Eli Bressert, Matthew Kirk, Carl Anderson, Ed Capriolo, Charity Majors, Jeff Carpenter, Tim Bergland, Ryan Boyd, David Robinson, Hadley Wickham, Laine Campbell, Hari Shreedharan, Dean Wampler, Mark Grover, and many more.

During the past year, the folks at O'Reilly Media have published a tremendous number of new titles from both well-known and up and coming experts - with many more schedule for publication. Consequently, we're bringing back the O'Reilly Author Showcase for Data Day Texas 2023. Throughout the day, we'll host Meet and Greet and Ask Me Anything sessions with the many attending O'Reilly authors.

The O'Reilly exhibit from Data Day Texas 2015. Like every other year, by the end of the day, all the books were gone.

Zhamak Dehghani, Data Mesh

In Data Mesh, author Zhamak Dehghani introduces a decentralized sociotechnical paradigm drawn from modern distributed architecture that provides a new approach to sourcing, sharing, accessing, and managing analytical data at scale. Dehghani guides practitioners, architects, technical leaders, and decision makers on their journey from traditional big data architecture to a distributed and multidimensional approach to analytical data management. Data mesh treats data as a product, considers domains as a primary concern, applies platform thinking to create self-serve data infrastructure, and introduces a federated computational model of data governance.

Adi Polak, Scaling Machine Learning

Adi Polak, VP DevEx at Treeverse and author of the upcoming Scaling Machine Learning with Spark, will be giving the Data Engineering Keynote at Data Day Texas 2023. Scaling Machine Learning with Spark examines various technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLFlow, TensorFlow, PyTorch, and Petastorm. The book covers data ingestion, preprocessing, feature engineering, training models, and bridging Spark and deep learning frameworks..

Hala Nelson, Essential Math for AI

Hala Nelson's accessible guide walks you through the math necessary to thrive in the AI field such as focusing on real-world applications rather than dense academic theory. Engineers, data scientists, and students alike will examine mathematical topics critical for AI--including regression, neural networks, optimization, backpropagation, convolution, Markov chains, and more--through popular applications such as computer vision, natural language processing, and automated systems. And supplementary Jupyter notebooks shed light on examples with Python code and visualizations. Whether you're just beginning your career or have years of experience, this book gives you the foundation necessary to dive deeper in the field.

Gwen Shapira, Kafka: The Definitive Guide, 2nd Edition

With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover Kafka's AdminClient API, transactions, new security features, and tooling changes. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.

Holden Karau, Scaling Python with Ray

With the Ray general-purpose serverless implementation in Python, programmers and data scientists can hide servers, implement stateful applications, support direct communication between tasks, and access hardware accelerators. In this book, experienced software architecture practitioners Holden Karau and Boris Lublinsky show you how to scale existing Python applications and pipelines, allowing you to stay in the Python ecosystem while reducing single points of failure and manual scheduling.

Bonny McClain, Location Intelligence

In spatial data science, things in closer proximity to one another likely have more in common than things that are farther apart. Author Bonny P. McClain demonstrates why detecting and quantifying patterns in geospatial data is vital. Both proprietary and open source platforms allow you to process and visualize spatial information. This book is for people familiar with data analysis or visualization who are eager to explore geospatial integration with Python.

Joe Reis, Fundamentals of Data Engineering

Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology.

Andy Petrella, Data Observability

Author Andy Petrella shows the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. He discusses ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Andy also covers how to use data observability to create a trustable communication framework with data consumers, and how to educate your peers about the benefits of data observability

Patrick McFadin, Managing Cloud Native Data

Patrick McFadin has been a force in the Apache Cassandra community for over a decade. His Managing Cloud Native Data with Kubernetes, co-authored with Jeff Carpenter, will be out in December, just in time for Data Day Texas 2023. Does that mean that Patrick will be talking about Kubernetes? Maybe.

Jeffrey Carpenter, Cassandra, The Definitive Guide

Jeffrey Carpenter joined the original author, Eben Hewitt, first in 2016 to publish the 2nd edition of Cassandra, The Definitive Guide. Together, they release the 3rd edition in 2020, and revised it again for January 2022. For anyone getting started with Apache Cassandra, this is the book. Check out the book's webpage at O'Reilly for all the details.

More authors to come...

Do you have a favorite O'Reilly author you'd like us to invite to Data Day Texas? Send us a note and let us know!

I first became aware of O'Reilly 1990, when I took over management of a bookstore across the street from the University of Texas. No sooner than I got behind the counter I started hearing requests for animal books. Several months later, the store had a whole case devoted to them. A few years later, it was a whole wall. Except for a few classics, we didn't have a computer book section - we had an O'Reilly section. This endeared us to the hackers and CS students who frequented the store. It wasn't long before I was reading the books myself. O'Reilly provided my initial CS education. When other publishers were printing 1000 page "bibles" - the HTML Bible, the Bash Bible - O'Reilly was publishing inexpensive right-sized books with just the information you need - and they were always first with the new technologies. -Lynn Bender, Global Data Geeks