Apache Cassandra at Data Day Texas

At the upcoming edition of Data Day, we will be hosting multiple tracks devoted to Apache Cassandra and its ecosphere. We are still accepting proposals for talks and mini-workshops in the Cassandra track. Are you doing something cool with Apache Cassandra? Share it with the community at Data Day Texas!

Confirmed Sessions

Cassandra and the Cloud

Jonathan Ellis - DataStax

Is Apache Cassandra still relevant in an era of hosted cloud databases? DataStax CTO Jonathan Ellis will discuss Cassandra’s strengths and weaknesses relative to Amazon DynamoDB, Microsoft CosmosDB, and Google Cloud Spanner.

Cassandra and Kubernetes

Ben Bromhead - Instaclustr

Kubernetes has become the most popular container orchestration and management API with cloud-native support from AWS, GCP, Azure and a growing enterprise support ecosystem. Leveraging Kubernetes to provide tested, repeatable deployment patterns that follow best practices is a win for both developers and operators.
In this talk Ben Bromhead, CTO of Instaclustr will introduce the Cassandra Kubernetes Operator, a Cassandra controller that provides robust, managed Cassandra deployments on Kubernetes. By adopting Kubernetes and Cassandra, you can provide DBaaS like services rapidly and easily to the rest of your team and have a simple on-ramp to true multi-cloud capabilities to your environment.

Cassandra Architecture FTW!

Jeff Carpenter - DataStax

In this talk we’ll take a deep dive into the architecture of Apache Cassandra to learn why it succeeds at scales where other databases fail. We’ll introduce the key distributed system design elements that Cassandra is built on, the problems that Cassandra solves especially well, and how to pair Cassandra with complementary technologies to build even more powerful systems. If you’ve heard about Cassandra and wondered if it was right for your use case, this talk is for you.

Cassandra pluggable storage engine

Dikang Gu - Facebook / Pengchao Wang - Facebook

Instagram is running one of the largest Cassandra deployments. In this year, the Cassandra team in Instagram has been working on a very interesting project to make Apache Cassandra's storage engine to be pluggable, and implement a new RocksDB based storage engine into Cassandra. The new storage engine can improve the performance of Apache Cassandra significantly.
In this talk, we will describe the motivation and different approaches we have considered, the high-level design of the solution we choose, also the performance metrics in benchmark and production environments.

Cassandra Performance Tuning and Crushing SLAs

Jon Haddad - The Last Pickle

In an ideal world, everything would just be fast out of the box. Unfortunately, we’re not quite there yet. Getting the best performance out of a database means understanding your entire system, from the hardware and OS to the databases’s internals. In this talk, Jon Haddad will discuss a wide range of performance tuning techniques. We’ll start by examining how to measure and interpret the statistics from the different components on our machines. Once we understand how to identify what exactly is holding our performance back, we can take the necessary steps to address the problem and move to the next issue. We’ll examine common pitfalls and problems, learning how to tune counters, compaction, garbage collection, compression, and more. If you’re working on a low latency, high throughput system you won’t want to miss this talk.

What have we done!? 10 years of Cassandra

Patrick McFadin - DataStax

10 years ago a couple of engineers at Facebook put up a project on Google code and a legend was born. The project has grown and users have shown an enormous amount of success. Are we ready to say Apache Cassandra has won and have a party? Let me present the evidence and we can decide as a group. No other database has delivered on the initial promises of being a reliable, performant, multi-datacenter source of record for important data. No other project, vendor or cloud has done as well or, I would argue, ever will.
I will highlight the main use cases and data models that has put Apache Cassandra ahead of its peers. If you are new to Apache Cassandra, come learn how you are lied to buy every other database that makes this claim. If you are a veteran, let me revive some of the thinking that got you here in the first place and give you some fresh reasons to love this database of ours.

Performance Data Modeling at Scale

Aaron Ploetz - Target

The most important aspect about backing your application with Cassandra, is in building a good data model. In addition to designing a query-based model that distributes well, performance at scale should also be a prime consideration. After all, you want good things to happen when your application gets a sudden 10x increase in traffic. At Target, the holiday season hits our infrastructure hard, and engineering to withstand that 10x increase is our reality.
In this presentation, we will examine real-world use cases and data processing scenarios. We will cover Cassandra data modeling techniques, and considerations for both high performance and large scale. Performance engineering of existing models will also be discussed, along with ways to get that extra bit of lower latency.
Intended audience: Cassandra DBAs, developers, and data modelers.

Go big or go home! Does it still make sense to do Big Data with Small Nodes?

Glauber Costa - ScyllaDB

In the world of Big Data, scaling out is the norm. The prospect of running massive computation in commodity hardware is enticing, but what does "commodity hardware" really mean? The usual 8-core setup people have been deploying with can now be found on phones, and every cloud provider makes boxes with 32 cores and up available at the click of a button. And still, a lot of Big Data deployments are trapped in a sea of small boxes cluster.
With the advent of scalable platforms like ScyllaDB, node performance is no longer an issue and doubling the size of the nodes will usually double the available storage and memory and processing power. So which other reasons stop people from going big in the Cloud Native world? This talk will explore some of the popular knowledge associated with it and delve into which are true, and which aren't.

Confirmed Speakers

Ben Bromhead (SF Bay)

Ben Bromhead (Linkedin) is Co-founder and CTO at Instaclustr, where he sets the technical direction for the company. Ben is well known as an active of the Apache Cassandra community. Prior to Instaclustr, Ben had been working as an independent consultant developing NoSQL solutions for enterprises. He ran a high-tech cryptographic and cyber security formal testing laboratory at BAE Systems and Stratsec.
Ben will be giving the following presentation: Cassandra and Kubernetes.

Jeff Carpenter (Scottsdale, Arizona) @jscarp

Jeff Carpenter (Linkedin) is a technology evangelist at DataStax, where he leverages his background in system architecture, microservices and Apache Cassandra to help empower developers and operations engineers build distributed systems that are scalable, reliable, and secure. Jeff has worked on projects ranging from a complex battle planning system in an austere network environment, to a cloud-based hotel reservation system and is the author of Cassandra: The Definitive Guide, 2nd Edition.
Jeff will be giving the following Cassandra presentation: Cassandra Architecture FTW!

Jonathan Ellis (Austin) @spyced

Jonathan Ellis is CTO and co-founder at DataStax. Prior to DataStax, Jonathan worked extensively with Apache Cassandra while employed at Rackspace. Prior to Rackspace, Jonathan built a multi-petabyte, scalable storage system based on Reed-Solomon encoding for backup provider Mozy.
Jonathan will be presenting the following Cassandra session: Cassandra and the Cloud.

Jon Haddad (Los Angeles) @rustyrazorblade

Jon Haddad (Linkedin) is the Principal Consultant at The Last Pickle, as well as a committer and PMC member for Apache Cassandra. Prior to The Last Pickle, Jon was a technical evangelist at DataStax. He has worked on dozens of Cassandra clusters across a wide variety of hardware, both on-prem and in the cloud. Jon has contributed to a wide variety of open source projects and has almost 20 years experience in the field.
Jon will be giving the following Cassandra presentation: Cassandra Performance Tuning and Crushing SLAs .

Patrick McFadin (SF Bay) @patrickmcfadin

Patrick McFadin (Linkedin), VP of Developer Relations at DataStax, is regarded as one of the foremost experts of Apache Cassandra and data modeling techniques. While at DataStax, he has helped build some of the largest deployments in the world. Previous to DataStax, Patrick was Chief Architect at Hobsons, an education services company. There, he spoke often on web application design and performance.
Patrick will be giving the following Cassandra presentation: What have we done!? 10 years of Cassandra.

Dikang Gu (San Francisco) @dikanggu

Dikang Gu (Linkedin) is a Staff Software Engineer at Facebook. He has years of experience working with big data/cloud computing platforms.
Dikang will be co-presenting the following Cassandra session: Cassandra Pluggable Storage Engine.

Aaron Ploetz (Minneapolis) @APloetz

Aaron Ploetz (Linkedin) has a professional software developer since 1997, and has been named a DataStax MVP for Apache Cassandra three times (2014-17). While not at work, he has been a computer hobbyist since 1987 (when his Mother first brought home a Tandy 1000 EX). He still works on a variety of projects in his home lab, including (but not limited to) building Linux servers, gaming machines, and test Cassandra clusters. Aaron received a Bachelor of Science degree in Management/Computer Systems from the University of Wisconsin - Whitewater in 1998, and a Master of Science degree in Software Engineering (emphasis on Database Technologies) from Regis University in 2013. He and his wife Coriene live with their three children in the Twin Cities. When not in front of a computer he enjoys amateur astronomy, writing, and coaching his sons' baseball and ice hockey teams.
Aaron will be presenting the following Cassandra session: Performance Data Modeling at Scale

Glauber Costa (Toronto) @glcst

Glauber Costa (Linkedin) is a Principal Architect at ScyllaDB. He shares his time between the engineering department working on upcoming Scylla features and helping customers succeed.
Before ScyllaDB, Glauber worked with Virtualization in the Linux Kernel for 10 years, with contributions ranging from the Xen and KVM Hypervisors to all sorts of guest functionality and containers.
Glauber will be presenting the following session: Go big or go home! Does it still make sense to do Big Data with Small Nodes?.