Apache Cassandra at Data Day Texas

At the upcoming edition of Data Day, we will be hosting multiple tracks devoted to Apache Cassandra and its ecosphere. We are still accepting proposals for talks and mini-workshops in the Cassandra track. Are you doing something cool with Apache Cassandra? Share it with the community at Data Day Texas!

Confirmed Sessions

Cassandra and Kubernetes

Ben Bromhead - Instaclustr

Kubernetes has become the most popular container orchestration and management API with cloud-native support from AWS, GCP, Azure and a growing enterprise support ecosystem. Leveraging Kubernetes to provide tested, repeatable deployment patterns that follow best practices is a win for both developers and operators.
In this talk Ben Bromhead, CTO of Instaclustr will introduce the Cassandra Kubernetes Operator, a Cassandra controller that provides robust, managed Cassandra deployments on Kubernetes. By adopting Kubernetes and Cassandra, you can provide DBaaS like services rapidly and easily to the rest of your team and have a simple on-ramp to true multi-cloud capabilities to your environment.

Cassandra Performance Tuning and Crushing SLAs

Jon Haddad - The Last Pickle

In an ideal world, everything would just be fast out of the box. Unfortunately, we’re not quite there yet. Getting the best performance out of a database means understanding your entire system, from the hardware and OS to the databases’s internals. In this talk, Jon Haddad will discuss a wide range of performance tuning techniques. We’ll start by examining how to measure and interpret the statistics from the different components on our machines. Once we understand how to identify what exactly is holding our performance back, we can take the necessary steps to address the problem and move to the next issue. We’ll examine common pitfalls and problems, learning how to tune counters, compaction, garbage collection, compression, and more. If you’re working on a low latency, high throughput system you won’t want to miss this talk.

Performance Data Modeling at Scale

Aaron Ploetz - Target

The most important aspect about backing your application with Cassandra, is in building a good data model. In addition to designing a query-based model that distributes well, performance at scale should also be a prime consideration. After all, you want good things to happen when your application gets a sudden 10x increase in traffic. At Target, the holiday season hits our infrastructure hard, and engineering to withstand that 10x increase is our reality.
In this presentation, we will examine real-world use cases and data processing scenarios. We will cover Cassandra data modeling techniques, and considerations for both high performance and large scale. Performance engineering of existing models will also be discussed, along with ways to get that extra bit of lower latency.
Intended audience: Cassandra DBAs, developers, and data modelers.

Go big or go home! Does it still make sense to do Big Data with Small Nodes?

Glauber Costa - ScyllaDB

In the world of Big Data, scaling out is the norm. The prospect of running massive computation in commodity hardware is enticing, but what does "commodity hardware" really mean? The usual 8-core setup people have been deploying with can now be found on phones, and every cloud provider makes boxes with 32 cores and up available at the click of a button. And still, a lot of Big Data deployments are trapped in a sea of small boxes cluster.
With the advent of scalable platforms like ScyllaDB, node performance is no longer an issue and doubling the size of the nodes will usually double the available storage and memory and processing power. So which other reasons stop people from going big in the Cloud Native world? This talk will explore some of the popular knowledge associated with it and delve into which are true, and which aren't.

Confirmed Speakers

Ben Bromhead (SF Bay)

Ben Bromhead (Linkedin) is Co-founder and CTO at Instaclustr, where he sets the technical direction for the company. Ben is well known as an active of the Apache Cassandra community. Prior to Instaclustr, Ben had been working as an independent consultant developing NoSQL solutions for enterprises. He ran a high-tech cryptographic and cyber security formal testing laboratory at BAE Systems and Stratsec.
Ben will be giving the following presentation: Cassandra and Kubernetes.

Jeff Carpenter (Scottsdale, Arizona) @jscarp

Jeff Carpenter (Linkedin) is a technology evangelist at DataStax, where he leverages his background in system architecture, microservices and Apache Cassandra to help empower developers and operations engineers build distributed systems that are scalable, reliable, and secure. Jeff has worked on projects ranging from a complex battle planning system in an austere network environment, to a cloud-based hotel reservation system and is the author of Cassandra: The Definitive Guide, 2nd Edition.

Jonathan Ellis (Austin) @spyced

Jonathan Ellis is CTO and co-founder at DataStax. Prior to DataStax, Jonathan worked extensively with Apache Cassandra while employed at Rackspace. Prior to Rackspace, Jonathan built a multi-petabyte, scalable storage system based on Reed-Solomon encoding for backup provider Mozy.

Jon Haddad (Los Angeles) @rustyrazorblade

Jon Haddad (Linkedin) is the Principal Consultant at The Last Pickle, as well as a committer and PMC member for Apache Cassandra. Prior to The Last Pickle, Jon was a technical evangelist at DataStax. He has worked on dozens of Cassandra clusters across a wide variety of hardware, both on-prem and in the cloud. Jon has contributed to a wide variety of open source projects and has almost 20 years experience in the field.
Jon will be giving the following Cassandra presentation: Cassandra Performance Tuning and Crushing SLAs .

Patrick McFadin (SF Bay) @patrickmcfadin

Patrick McFadin (Linkedin), VP of Developer Relations at DataStax, is regarded as one of the foremost experts of Apache Cassandra and data modeling techniques. While at DataStax, he has helped build some of the largest deployments in the world. Previous to DataStax, Patrick was Chief Architect at Hobsons, an education services company. There, he spoke often on web application design and performance.

Dikang Gu (San Francisco) @dikanggu

Dikang Gu (Linkedin) is a Staff Software Engineer at Facebook. He has years of experience working with big data/cloud computing platforms.

Aaron Ploetz (Minneapolis) @APloetz

Aaron Ploetz (Linkedin) has a professional software developer since 1997, and has been named a DataStax MVP for Apache Cassandra three times (2014-17). While not at work, he has been a computer hobbyist since 1987 (when his Mother first brought home a Tandy 1000 EX). He still works on a variety of projects in his home lab, including (but not limited to) building Linux servers, gaming machines, and test Cassandra clusters. Aaron received a Bachelor of Science degree in Management/Computer Systems from the University of Wisconsin - Whitewater in 1998, and a Master of Science degree in Software Engineering (emphasis on Database Technologies) from Regis University in 2013. He and his wife Coriene live with their three children in the Twin Cities. When not in front of a computer he enjoys amateur astronomy, writing, and coaching his sons' baseball and ice hockey teams.
Aaron will be presenting the following Cassandra session: Performance Data Modeling at Scale

Glauber Costa (Toronto) @glcst

Glauber Costa (Linkedin) is a Principal Architect at ScyllaDB. He shares his time between the engineering department working on upcoming Scylla features and helping customers succeed.
Before ScyllaDB, Glauber worked with Virtualization in the Linux Kernel for 10 years, with contributions ranging from the Xen and KVM Hypervisors to all sorts of guest functionality and containers.
Glauber will be presenting the following session: Go big or go home! Does it still make sense to do Big Data with Small Nodes?.