The Data Day Texas 2022 Sessions

We still have discount rooms at the AT&T. If you are coming from out of town, this is where all the action is. For the best selection, Book a room now.

We are just now beginning to announce the sessions for Data Day Texas. We'll be adding new sessions daily.

Data Visualization Keynote
What Data Visualization Can Learn From Dance

Weidong Yang - Kinetech

Building an effective visualization strategy to understand inherently complex and high dimensional data is challenging. Where descriptive representations fall short, data visualization can expand insight by appealing to our other senses. Dance, in the same vein, can effectively express concepts not easily expressed verbally. It's inherently dynamic, expressive and perceptive. Many strategies in creating effective dance can be borrowed for effective data visualization including how we relate to one another and space through the physics of motion, tension, and rhythm. Adopted properly, it awakens our full perceptive ability in gaining insights from complex data. It lets us “dance” with the data and perceive rather than just describe data.
Today’s digital data visualization makes it possible to incorporate dynamics over time and opens the door for us to take advantage of our perceptive power. This talk will present several examples of how lessons learned from the practice of dance making can be applied effectively to data visualization.

Shortcut MLOps with In-Database Machine Learning

Paige Roberts - Vertica

MLOps has rocketed to prominence based on one, clear problem: many machine learning projects never make it into production. According to a recent survey by Algorithmia, 35% of even the small percentage of projects that do make it take between one month and a year to get put to work. Since data science is a cost center for organizations until those models are deployed, the need to shorten, organize, and streamline the process from ideation to production is essential.
Data science is not simply for the broadening of human knowledge, data science teams get paid to find ways to shave costs and boost revenues. That can mean preventative maintenance that keeps machines on line, churn reduction, customer experience improvements, targeted marketing that earns and keeps good customers, fraud prevention or cybersecurity that keeps assets safe and prevents loss, or AIOps that optimizes IT to get maximum capabilities for minimum costs.
To get those benefits, do you need to add yet another piece of technology to already bloated stacks? There may be a way for organizations to get machine learning into production faster with something nearly every company already has: a good analytics database.
Learn how to:
• Enable data science teams to use their preferred tools – Python, R, Jupyter – on multi-terabyte data sets
• Provide dozens of data types and formats at high scale to data science teams, without duplicating data pipeline efforts
• Make new machine learning projects just as straightforward as enabling BI teams to create a new dashboard
• Get machine learning projects from finished model to production money-maker in minutes, not months

Continuous ML Improvement: Automated Monitoring with Built-In Explainability

Amy Hodler - Fiddler AI

ML models tend to lose their predictive power over time and can fail silently. In this session, we’ll review how to identify and stay ahead of the common culprits: model drift, data integrity, outliers and bias. You’ll see how cutting-edge explainable AI and model analytics can quickly find the root cause of operational issues. And we’ll outline how model and cohort comparison help teams iterate and get new models in production faster.
I’ll cover how leading enterprises are achieving trustworthy AI by integrating model performance management into their MLOps lifecycle. You’ll walk away knowing how to use continuous ML monitoring and explainability to achieve optimal model performance and accelerate business outcomes.

4 Types of ML Drift and How to Catch Them (Or “Why your AI is wrong, eventually”)

Amy Hodler - Fiddler AI

When production data no longer resembles the historical training data, our machine learning predictions become less accurate. And the pandemic taught us that forecasting and predictions can be invalidated in a matter of weeks. That sudden shift was an alarm to be vigilant to not just major changes, but also the more insidious degradation of results that all AI (yes yours too) is prone to over time. However, identifying meaningful drift versus spurious or acceptable drift can be tedious – let alone uncovering which ML features are the source of the issue.
Attend this session to:
Understand the types of machine learning drift from concept and prediction drift to label and feature drift.
Hear how to identify the root cause and evaluate the impact of drift with automated measures.
See a demo of state-of-the-science drift monitoring and explainability

Fighting COVID-19 using Knowledge Graphs

Ying Ding - University of Texas

COVID-19 pandemic has caused the tragic loss of human lives and the worst economic downturn since the Great Depression. Scientific discovery heavily depends on the accumulated knowledge by peer scientists. There are over 500,000 COVID-19 related articles published, but nobody can read them all. This talk highlights the importance of knowledge graph and how it can help scientists and the public to figure against COVID-19. It covers the topics ranging from building COVID-19 knowledge graph based on scientific publications to faciliate drug discovery, using contrastive deep learning to enhance the risk prediction for COVID patients, to investigating the parachuting collaboration and research novelty in COVID-19 related research activities.

Business Transformers - Leveraging Transfer Learning for B2B Insights

Paul Azunre - Dun and Bradstreet

Pretrained transformer-based neural network models, such as BERT, GPT-3 & XLM have rapidly advanced the state of natural language processing (NLP). Instead of training models from scratch, practitioners can now download general pretrained models and quickly adapt them to new scenarios and tasks with relatively little training data. This has led to rapid advances in various applications, from medicine to natural language generation and chatbots. Paul will overview related recent advances and concepts, and discuss some ways in which these tools can be applied to extract B2B Insights – from detecting and measuring the various stages of the customer buying journey to extending any such analysis capability from English to multiple other languages.

For the overwhelmed data professionals: What to do when there is so much to do?

Yue Cathy Chang / Jike Chong

95% of the companies with data science teams have teams of fewer than ten members. In a nimble team with a wide range of possibilities to make business impacts with data science, how do you explore and prioritize the opportunities? How do you ensure that there are sponsors and champions for your project? How do you set realistic expectations with business partners for project success? If you are leading a team, how can you delegate your projects effectively?
In this talk, the presenters will share three techniques for project prioritization, two roles (sponsor and champions) to identify, four levels of confidence (for predictive models) to specify project success, and discuss best practices for delegating work as a team lead/manager.

A gentle introduction to using graph neural networks on knowledge graphs

Dave Bechberger - Amazon Web Services

Knowledge graphs are a hot topic in the enterprise data space but they often suffer from a lack of rich connections within the data. Graph neural networks (GNNs) can help fill this gap by using the structure of connections within the data to predict new connections. These two seem like a natural partnership, however understanding what each of these is and how to leverage them together is a challenge.
In this talk we’ll provide a gentle introduction to the concepts of knowledge graphs and graph neural networks. We will discuss what a knowledge graph is, why you might want to use one, and some of the challenges you will face. We will then discuss key concepts of graph-based machine learning including an introduction to how they work, what use cases they solve, and how they can be leveraged within knowledge graphs. Finally, we’ll walk through a few demonstrations to show the power of leveraging knowledge graphs and GNN’s together

Introduction to Taxonomies for Data Scientists (tutorial)

Heather Hedden - Semantic Web Company

This tutorial/workshop teaches the fundamentals and best practices for creating quality taxonomies, whether for the enterprise or for specific knowledge bases in any industry. Emphasis is on serving users rather than on theory. Topics to be covered include: the appropriateness of different kinds of knowledge organization systems (taxonomies, thesauri, ontologies, etc.), standards, taxonomy concept creation and labeling, taxonomy relationship creation. The connection of taxonomies to ontologies and knowledge graphs will also be briefly discussed. There will be some interactive activities and hands-on exercises. This session will cover:
Introduction to taxonomies and their relevance to data
• Comparisons of taxonomies and knowledge organization system types
• Standards for taxonomies and knowledge organization systems
• Taxonomy concept creation
• Preferred and alternative label creation
• Taxonomy relationship creation
• Taxonomy relationships to ontologies and knowledge graphs
• Best practices and taxonomy management software use

The Future of Taxonomies - Linking data to knowledge

Heather Hedden - Semantic Web Company

Taxonomies are no longer just for navigation or tagging documents. AI technologies, such as machine learning and NLP are now being used more in combination with taxonomies rather than merely in place of taxonomies. The results include applications of semantic search, personalization, recommendation, question-answering. Combining taxonomies with ontologies and linked instance data, supporting more powerful search and analytics across all kinds of data, structured and unstructured, and not just documents. Topics to be discussed include:
- Trends in uses of taxonomies (industries, applications, implementation and management), benefits of taxonomies,
- How knowledge-based recommendation and personalization systems are built
- How NLP and taxonomies support each other
- How taxonomies contribute to knowledge graphs
- Semantic Web standards and their benefits for taxonomies and other knowledge organization systems

Visual timeline analytics: applying concepts from graph theory to timeline and time series data

Corey Lanum - Cambridge Intelligence

Timelines are one of the most basic forms of data visualization and plotting events on a timeline is usually a straightforward process. But what happens when we have connections in that data? Time-based data often takes the properties of both time series and graph data sets, but the traditional visualizations for each are lacking. Time series visualizations like a stock ticker chart can show how a variable changes over time, but not how data elements are connected to one another. And node-link charts can show relationships but not how they may be changing over time. This talk introduces the new concept of visual timeline analysis, a kind of visualization designed to detangle complex time-based connected data. Corey will share his experience helping organizations harness visual timelines in their applications taking inspiration from the graph world. He’ll show some examples of how a visualization timeline can show connected data over time and how looking at the data this way can produce some really interesting insights.

Using Reproducible Experiments To Create Better Models

Milecia McGregor - Iterative

It's easy to lose track of which changes gave you the best result when you start exploring multiple model architectures. Tracking the changes in your hyperparameter values, along with code and data changes, will help you build a more efficient model by giving you an exact reproduction of the conditions that made the model better.
In this talk, you will learn how you can use the open-source tool, DVC, to increase reproducibility for two methods of tuning hyperparameters: grid search and random search. We'll go through a live demo of setting up and running grid search and random search experiments. By the end of the talk, you'll know how to add reproducibility to your existing projects.

What is Truth? - Strategies for managing semantic triples in large complex systems

Ryan Mitchell - Gerson Lehrman Group

Knowledge graphs and their ontology-based database kin have exploded in popularity in recent years. Through cheap data sources and powerful inference engines, they can grow extremely quickly. But when bad data gets in, their judgments can be poisoned just as fast. We will discuss strategies for managing semantic triples and assessing their “truthiness” in large complex systems with many sources of data, machine learning algorithms, and fact validation methods.

A Path to Strong AI

Jonathan Mugan - DeUmbra

We need strong artificial intelligence (AI) so it can help us understand the nature of the universe to satiate our curiosity, devise cures for diseases to ease our suffering, and expand to other star systems to ensure our survival. To do all this, AI must be able to learn condensed representations of the environment in the form of models that enable it to recognize entities, infer missing information, and predict events. And because the universe is of almost infinite complexity, AI must be able to compose these models dynamically to generate combinatorial representations to match this complexity. This talk will cover why models are needed for robust intelligence and why an intelligent agent must be able to dynamically compose those models. It will also cover the current state-of-the-art in AI model building, discussing both the strengths and weaknesses of neural networks and probabilistic programming. The talk will cover how we can train an AI to build models that will enable it to be robust, and it will conclude with how we can effectively evaluate AI using technology from computer-animated movies and videogames.
This talk is based on an article Jonathan will be publishing in The Gradient

History of Network Science - A Look at How Networks Have Connected Us

Sean Robinson - Graphable

The rise of graph analytics and network science within data has been steadily growing for several years. Today we see top companies offering new ways to leverage the interconnected power of networks to answer the increasingly complex and subtle questions businesses are asking of their data. However, the use of networks to model data is a technique which has grown over hundreds of years.This talk takes viewers through a timeline of network science as we explore how people have taken on novel challenges in the real world and how they used networks as a solution to these problems. Beginning in 1736 with the Seven Bridges of Königsberg and through to today’s innovations in network science, by exploring the challenges which drove early innovators to rely on the power of networks, we can both understand the groundwork that led to the current innovations we see arising today and gain insight into how these innovations can shape our own solutions to everyday data problems.

Outrageous ideas for Graph Databases

Max De Marzi - Amazon Web Services

Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.

Introduction to Graph Data Science for Python Developers (workshop)

Clair Sullivan - Neo4j

Connected data is everywhere! In data science and machine learning we create models based on the idea that each data point within a set is an independent measurement. However, this assumption is relaxed when we are talking about connected data as represented in a network graph. We see instances of this in a variety of common data science problems such as from modeling customer churn to creating recommendation engines to detecting banking fraud. For example, consider the case of customer churn in social media. A traditional machine learning model can be created to predict churn that treats each user as an independent entity. We can engineer features to create vector embeddings of these users that we then use to generate models and quantify the results of our models. However, this approach ignores a key thing: users are not actually independent entities. It is clear that a user is more likely to churn from a social media platform if one or several of their friends do first. By ignoring the relationships between data points -- the connection between two or more users (friendship in this case) -- we have weakened our ability to represent the complete picture, resulting in a significantly less accurate model.
In this workshop we will introduce the fundamentals of graph data science as explored through Python. We will explore how to enhance existing models or create new models that go beyond what can be represented by a standard relational database. When we represent our data as a graph, this will allow us to treat the connections between individual data points as a “first-class entity.” Using a graph database and real-world data the participants of this workshop will learn how to analyze, manipulate, and create machine learning models to solve problems that are much more complex to solve with traditional approaches.