The Data Day Texas 2022 Sessions

We still have discount rooms at the AT&T. If you are coming from out of town, this is where all the action is. For the best selection, Book a room now.

We are just now beginning to announce the sessions for Data Day Texas. We'll be adding new sessions almost daily.

A gentle introduction to using graph neural networks on knowledge graphs

Dave Bechberger - Amazon Web Services

Knowledge graphs are a hot topic in the enterprise data space but they often suffer from a lack of rich connections within the data. Graph neural networks (GNNs) can help fill this gap by using the structure of connections within the data to predict new connections. These two seem like a natural partnership, however understanding what each of these is and how to leverage them together is a challenge.
In this talk we’ll provide a gentle introduction to the concepts of knowledge graphs and graph neural networks. We will discuss what a knowledge graph is, why you might want to use one, and some of the challenges you will face. We will then discuss key concepts of graph-based machine learning including an introduction to how they work, what use cases they solve, and how they can be leveraged within knowledge graphs. Finally, we’ll walk through a few demonstrations to show the power of leveraging knowledge graphs and GNN’s together

Introduction to Taxonomies for Data Scientists (tutorial)

Heather Hedden - Semantic Web Company

This tutorial/workshop teaches the fundamentals and best practices for creating quality taxonomies, whether for the enterprise or for specific knowledge bases in any industry. Emphasis is on serving users rather than on theory. Topics to be covered include: the appropriateness of different kinds of knowledge organization systems (taxonomies, thesauri, ontologies, etc.), standards, taxonomy concept creation and labeling, taxonomy relationship creation. The connection of taxonomies to ontologies and knowledge graphs will also be briefly discussed. There will be some interactive activities and hands-on exercises. This session will cover:
Introduction to taxonomies and their relevance to data
• Comparisons of taxonomies and knowledge organization system types
• Standards for taxonomies and knowledge organization systems
• Taxonomy concept creation
• Preferred and alternative label creation
• Taxonomy relationship creation
• Taxonomy relationships to ontologies and knowledge graphs
• Best practices and taxonomy management software use

The Future of Taxonomies - Linking data to knowledge

Heather Hedden - Semantic Web Company

Taxonomies are no longer just for navigation or tagging documents. AI technologies, such as machine learning and NLP are now being used more in combination with taxonomies rather than merely in place of taxonomies. The results include applications of semantic search, personalization, recommendation, question-answering. Combining taxonomies with ontologies and linked instance data, supporting more powerful search and analytics across all kinds of data, structured and unstructured, and not just documents. Topics to be discussed include:
- Trends in uses of taxonomies (industries, applications, implementation and management), benefits of taxonomies,
- How knowledge-based recommendation and personalization systems are built
- How NLP and taxonomies support each other
- How taxonomies contribute to knowledge graphs
- Semantic Web standards and their benefits for taxonomies and other knowledge organization systems

Continuous ML Improvement: Automated Monitoring with Built-In Explainability

Amy Hodler - Fiddler AI

ML models tend to lose their predictive power over time and can fail silently. In this session, we’ll review how to identify and stay ahead of the common culprits: model drift, data integrity, outliers and bias. You’ll see how cutting-edge explainable AI and model analytics can quickly find the root cause of operational issues. And we’ll outline how model and cohort comparison help teams iterate and get new models in production faster.
I’ll cover how leading enterprises are achieving trustworthy AI by integrating model performance management into their MLOps lifecycle. You’ll walk away knowing how to use continuous ML monitoring and explainability to achieve optimal model performance and accelerate business outcomes.

4 Types of ML Drift and How to Catch Them (Or “Why your AI is wrong, eventually”)

Amy Hodler - Fiddler AI

When production data no longer resembles the historical training data, our machine learning predictions become less accurate. And the pandemic taught us that forecasting and predictions can be invalidated in a matter of weeks. That sudden shift was an alarm to be vigilant to not just major changes, but also the more insidious degradation of results that all AI (yes yours too) is prone to over time. However, identifying meaningful drift versus spurious or acceptable drift can be tedious – let alone uncovering which ML features are the source of the issue.
Attend this session to:
Understand the types of machine learning drift from concept and prediction drift to label and feature drift.
Hear how to identify the root cause and evaluate the impact of drift with automated measures.
See a demo of state-of-the-science drift monitoring and explainability

Visual timeline analytics: applying concepts from graph theory to timeline and time series data

Corey Lanum - Cambridge Intelligence

Timelines are one of the most basic forms of data visualization and plotting events on a timeline is usually a straightforward process. But what happens when we have connections in that data? Time-based data often takes the properties of both time series and graph data sets, but the traditional visualizations for each are lacking. Time series visualizations like a stock ticker chart can show how a variable changes over time, but not how data elements are connected to one another. And node-link charts can show relationships but not how they may be changing over time. This talk introduces the new concept of visual timeline analysis, a kind of visualization designed to detangle complex time-based connected data. Corey will share his experience helping organizations harness visual timelines in their applications taking inspiration from the graph world. He’ll show some examples of how a visualization timeline can show connected data over time and how looking at the data this way can produce some really interesting insights.

What is Truth? - Strategies for managing semantic triples in large complex systems

Ryan Mitchell - Gerson Lehrman Group

Knowledge graphs and their ontology-based database kin have exploded in popularity in recent years. Through cheap data sources and powerful inference engines, they can grow extremely quickly. But when bad data gets in, their judgments can be poisoned just as fast. We will discuss strategies for managing semantic triples and assessing their “truthiness” in large complex systems with many sources of data, machine learning algorithms, and fact validation methods.

A Path to Strong AI

Jonathan Mugan - DeUmbra

We need strong artificial intelligence (AI) so it can help us understand the nature of the universe to satiate our curiosity, devise cures for diseases to ease our suffering, and expand to other star systems to ensure our survival. To do all this, AI must be able to learn condensed representations of the environment in the form of models that enable it to recognize entities, infer missing information, and predict events. And because the universe is of almost infinite complexity, AI must be able to compose these models dynamically to generate combinatorial representations to match this complexity. This talk will cover why models are needed for robust intelligence and why an intelligent agent must be able to dynamically compose those models. It will also cover the current state-of-the-art in AI model building, discussing both the strengths and weaknesses of neural networks and probabilistic programming. The talk will cover how we can train an AI to build models that will enable it to be robust, and it will conclude with how we can effectively evaluate AI using technology from computer-animated movies and videogames.
This talk is based on an article Jonathan will be publishing in The Gradient

Introduction to Graph Data Science for Python Developers (workshop)

Clair Sullivan - Neo4j

Connected data is everywhere! In data science and machine learning we create models based on the idea that each data point within a set is an independent measurement. However, this assumption is relaxed when we are talking about connected data as represented in a network graph. We see instances of this in a variety of common data science problems such as from modeling customer churn to creating recommendation engines to detecting banking fraud. For example, consider the case of customer churn in social media. A traditional machine learning model can be created to predict churn that treats each user as an independent entity. We can engineer features to create vector embeddings of these users that we then use to generate models and quantify the results of our models. However, this approach ignores a key thing: users are not actually independent entities. It is clear that a user is more likely to churn from a social media platform if one or several of their friends do first. By ignoring the relationships between data points -- the connection between two or more users (friendship in this case) -- we have weakened our ability to represent the complete picture, resulting in a significantly less accurate model.
In this workshop we will introduce the fundamentals of graph data science as explored through Python. We will explore how to enhance existing models or create new models that go beyond what can be represented by a standard relational database. When we represent our data as a graph, this will allow us to treat the connections between individual data points as a “first-class entity.” Using a graph database and real-world data the participants of this workshop will learn how to analyze, manipulate, and create machine learning models to solve problems that are much more complex to solve with traditional approaches.