Apache Cassandra at Data Day Texas

Apache Cassandra has been a part of Data Day Texas since the very beginning. We organized the first Cassandra Summit back in 2010; and Jonathan Ellis, Apache Cassandra Project Chair 2010-2016, gave the keynote at the first Data Day Texas back in 2011.

Here are some of the sessions for our 2023 Cassandra Track:

Cassandra on ACID: this changes everything
Patrick McFadin

Cassandra has established itself as the database of choice for high-scale, resilient, globally available data applications for more than ten years. All this is available with the tradeoff of working with eventually consistent data models. “I can’t use Cassandra because it doesn’t have transactions.” With the arrival of Accord and CEP-15, those days will be over soon. ACID-compliant, strictly serializable transactions are available with everything Cassandra already promises with scale, distribution, and reliability. Sounds like magic? Nope, it’s computer science, and I’ll tell you all about it. Prepare to completely level up any data-driven applications you have with way fewer tradeoffs. What we’ll cover:Transactional guarantees now availableChanges in CQL SyntaxExamples of how it could be used in your application.

Database Schema Optimization in Apache Cassandra
Artem Chebotko

Level up your data modeling skills with these five schema optimization techniques that are frequently used to create efficient and scalable data models for Apache Cassandra: - splitting large partitions and dynamic bucketing;- data duplication and batches;- indexes and materialized views;- race conditions and lightweight transactions;- tombstone-related problems and solutions.This presentation would be most beneficial to audiences with prior experience using Cassandra.

Why your Database needs an API
Jeffrey Carpenter

For many years, application developers have been dependent on database administrators (DBAs) in order to design schema and write efficient queries. This worked in a world in which client applications used drivers to talk to databases within the same datacenter using custom binary protocols. In our modern cloud native world, much has changed. Developers now write in a wider variety of languages than ever, and HTTP is the dominant network protocol, making traditional drivers more difficult to use effectively.
In this talk, we’ll introduce the concept of a Data API gateway and its key features, and examine the Stargate project, a data API gateway built on top of Apache Cassandra. We’ll discuss the various API styles that Stargate supports including REST, GraphQL, Document, and gRPC, and the benefits of each. We’ll also dig into Stargate’s architecture to see how it scales horizontally, abstracts the underlying database, and how easy it is to create new APIs.
Prerequisite knowledge: basic familiarity with HTTP APIs and DB query languages.

Real-Time Recommendations with Graph and Event Streaming
Aaron Ploetz

Real-time data is becoming increasingly important to success of the enterprise mission. Successfully collecting incoming data while maintaining the ability to react both quickly and strategically is paramount.However, the data collection process is often far from trivial. Write contention is a common bottleneck with many large-scale architectures. To further complicate matters, today’s developers are abstracted farther away from the critical areas of the write path than ever before.How can we ensure that all of our data is being stored while not overloading the storage layer?Focusing on the use case of a movie recommendation service, we will discuss the choices and trade-offs of different architecture components. We will show the data model implementation as well as ways to leverage our graph database to maximize data discovery. Finally, we will show ways to improve data delivery guarantees and how to mitigate write back-pressure on our database by using an event streaming platform (Apache Pulsar).

Your data infrastructure will be in Kubernetes
Patrick McFadin / Jeff Carpenter

Are people actually moving stateful workloads to K8s? Yes, yes they are. In the process of writing the book Managing Cloud Native Data on Kubernetes, we spoke with a bunch of the experts who are moving various types of stateful workloads to K8s. In this talk we’ll share what we learned:
• What’s solid: storage and workload management
• What’s good and getting better: operators, streaming, and database workloads
• What needs work: analytics and machine learning
We’ll also share what this means for your data infrastructure:
• Infrastructure should conform to your application and not the other way around.
• Stop creating new data infrastructure projects and start assembling new architectures
• Look to open source projects for inspiration


Jonathan Ellis discussing future trends in databases at Data Day Texas 2020


Patrick McFadin interviewing Denise Gosnell at Data Day Texas 2020