The O'Reilly Author Showcase at Data Day

O'Reilly authors have been a part of Data Day Texas since the very beginning. Some of those who've presented over the years are: Gwen Shapira, Holden Karau, Jay Kreps, Emil Efrim, WesMcKinney, Sandy Ryza, Russell Jurney, Ted Dunning, Eric Sammer, Josh Wills, Julia Silge, Sean Owen, Amy Hodler, Andy Petrella, Ryan Mitchell, Joey Echeverria, Denise Gosnell, Matthias Broecheler, Matthew Russell, Eric Lubow, Eli Bressert, Matthew Kirk, Carl Anderson, Ed Capriolo, Charity Majors, Jeff Carpenter, Tim Bergland, Ryan Boyd, David Robinson, Hadley Wickham, Laine Campbell, Hari Shreedharan, Dean Wampler, Mark Grover,
Bonny Mclain, Patrick McFadin, and many more.

Based on enthusiastic response from our attendees, we’re bringing back the O'Reilly Author Showcase for Data Day Texas 2024. We’ll be hosting book signings, office hours, and Ask Me Anything sessions. Here’s your chance to meet the authors and ask questions in person. We've just begun to confirm speakers, and will be announcing more in the coming weeks.


The O'Reilly Author Showcase from Data Day Texas 2015.

 

Confirmed Authors for 2024

 

Susan Shu Chang, Machine Learning Interviews

In Machine Learning Interviews, author Susan Shu Chang shows how to tackle the ML hiring process.
As tech products become more prevalent today, the demand for machine learning professionals continues to grow. But the responsibilities and skill sets required of ML professionals still vary drastically from company to company, making the interview process difficult to predict. Susan will take you through the highly selective recruitment process by sharing hard-won lessons she learned along the way. You'll quickly understand how to successfully navigate your way through typical ML interviews.

 

Ole Olesen-Bagneux, Enterprise Data Catalog

In The Enterprise Data Catalog, author Ole Olesen-Bagneux explains the benefits of implementing a data catalog. He shows how to organize data for your catalog, search for what you need, and manage data within the catalog. The book is written from both a data management perspective as well as from a library and information science perspective. Among the topics covered are: what is a data catalog and how it can help your organization, how to organize data and its sources into domains and describe them with metadata, how to search data using simple-to-complex search techniques and learn to browse in domains, data lineage, and graphs, and how to manage the data in your company via a data catalog. The book then shows how to implement a data catalog in a way that exactly matches the strategic priorities of your organization, and finally what the future has in store for data catalogs.

 

Ron Itelman, Unifying Business, Data, and Code

In Unifying Business, Data, and Code, co-authors Ron Itelman and Juan Cruz Viotti show how to collaborate more effectively and design intelligent systems without having to become a data scientist. Map your team, objectives, data, actions, and outcomes as a holistic network and discover connections that may not always be obvious. You'll learn how to reveal hidden root problems and explain how information flows across your organizational networks in order to innovate better, faster.

 

Andy Petrella, Fundamentals of Data Observability

In Fundamentals of Data Observability, author Andy Petrella shows the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. He discusses ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Andy also covers how to use data observability to create a trustable communication framework with data consumers, and how to educate your peers about the benefits of data observability

 

Alex Merced / Dipankar Mazumdar, Apache Iceberg: The Definitive Guide

Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. This lack of flexibility forces you to adjust your workflow to the tool your data is locked in, which creates data silos and data drift. This book shows you a better way.
In Apache Iceberg: The Definitive Guide, co-authors Alex Merced and Dipankar Mazumdar provide the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this lakehouse.
Alex and Dipankar will be hosting an Apache Iceberg: Ask Me Anything session at Data Day Texas.

 

Holden Karau, Scaling Python with Dask

In Scaling Python with Dask, authors Holden Karau and Mika Kimmins show you how to use Dask computations in local systems and then scale to the cloud for heavier workloads. This practical book explains why Dask is popular among industry experts and academics and is used by organizations that include Walmart, Capital One, Harvard Medical School, and NASA.

 

Amy Hodler, Graph Algorithms

In what has been called the book on Graph Algorithms, co-authors Amy Hodler and Mark Needham explain how graph algorithms describe complex structures and reveal difficult-to-find patterns—from finding vulnerabilities and bottlenecks to detecting communities and improving machine learning predictions. The book provides hands-on examples that show how to use graph algorithms in Apache Spark and Neo4j, two of the most common choices for graph analytics.

 

Adi Polak, Scaling Machine Learning with Spark

Scaling Machine Learning with Spark, by author Adi Polak, examines various technologies for building end-to-end distributed ML workflows based on the Apache Spark ecosystem with Spark MLlib, MLFlow, TensorFlow, PyTorch, and Petastorm. The book covers data ingestion, preprocessing, feature engineering, training models, and bridging Spark and deep learning frameworks.

 

Hala Nelson, Essential Math for AI

In Essential Math for AI, author Hala Nelson walks you through the math necessary to thrive in the AI field such as focusing on real-world applications rather than dense academic theory. Engineers, data scientists, and students alike will examine mathematical topics critical for AI--including regression, neural networks, optimization, backpropagation, convolution, Markov chains, and more--through popular applications such as computer vision, natural language processing, and automated systems. And supplementary Jupyter notebooks shed light on examples with Python code and visualizations. Whether you're just beginning your career or have years of experience, this book gives you the foundation necessary to dive deeper in the field.

 

Joe Reis / Matt Housley, Fundamentals of Data Engineering

In Fundamentals of Data Engineering, currently a category bestseller at Amazon, Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology.

 

Patrick McFadin, Managing Cloud Native Data

Using Kubernetes as your platform, you'll learn open source technologies that are designed and built for the cloud. Authors Jeff Carpenter and Patrick McFadin provide case studies to help you explore new use cases and avoid the pitfalls others have faced. You'll get an insider's view of what's coming from innovators who are creating next-generation architectures and infrastructure.

 

Matthias Broecheler, The Practitioners Guide to Graph Data

In The Practitioners Guide to Graph Data, authors Denise Koessler Gosnell and Matthias Broecheler show data engineers, data scientists, and data analysts how to solve complex problems with graph databases. You’ll explore templates for building with graph technology, along with examples that demonstrate how teams think about graph data within an application.

 

More authors to come...

Do you have a favorite O'Reilly author you'd like us to invite to Data Day Texas? Send us a note and let us know!

 


O'Reilly author Denise Gosnell leading a session on graph thinking at Data Day Texas 2020.

 


O'Reilly author Andy Petrella leading a session on observability at Data Day Texas 2022.

 


O'Reilly authors Matt Housley, Holden Karau, Andy Petrella, Patrick McFadin, Gwen Shapira, Adi Polak, and Joe Reis hanging out the night before Data Day Texas 2023.

 

Data Day Texas: it grew out of a bookstore...

In the early 90s, a small foreign language bookstore situated on the southeast corner of the University of Texas was adopted and transformed by the Austin open source / hacker community. Over the course of a decade, through several incarnations, the shop came to be known as the guerrilla computer book store. In its final years, the store's subterranean lounge became home to weekly Friday afternoon gatherings of the tech community. The gatherings were referred to as GeekAustin. When the store finally closed in 2000, the community carried on and grew, hosting hackathons, happy hours, training, community benefit events such as Linux Against Poverty. In 2009, former bookstore owner and GeekAustin organizer, Lynn Bender, launched the first MongoDB Day; in 2010, the first Cassandra Summit; and in 2011, Data Day Texas.

I first became aware of O'Reilly 1990, when I took over management of a bookstore across the street from the University of Texas. No sooner than I got behind the counter I started hearing requests for animal books. Several months later, the store had a whole case devoted to them. A few years later, it was a whole wall. Except for a few classics, we didn't have a computer book section - we had an O'Reilly section. This endeared us to the hackers and CS students who frequented the store. It wasn't long before I was reading the books myself. O'Reilly provided my initial CS education. When other publishers were printing 1000 page "bibles" - the HTML Bible, the Bash Bible - O'Reilly was publishing inexpensive right-sized books with just the information you need - and they were always first with the new technologies. -Lynn Bender, Data Day Texas