The Data Day Texas 2022 Sessions

We still have discount rooms at the AT&T. If you are coming from out of town, this is where all the action is. For the best selection, Book a room now.

We are just now beginning to announce the sessions for Data Day Texas. We'll be adding new sessions daily.

Introduction to Codeless Deep Learning

90 minute hands-on tutorial
Satoru Hayasaka - (KNIME)

Deep learning is used successfully in many data science applications, such as image processing, text processing, and fraud detection. KNIME offers an integration to the Keras libraries for deep learning, combining the codeless ease of use of KNIME Analytics Platform with the extensive coverage of deep learning paradigms by the Keras libraries. Though codeless, implementing deep learning networks still requires orientation to distinguish between the learning paradigms, the feedforward multilayer architectures, the networks for sequential data, the encoding required for text data, the convolutional layers for image data, and so on. This course will provide you with the basic orientation in the world of deep learning and the skills to assemble, train, and apply different deep learning modules.

Building a Modern Data Platform Using Advanced Redshift Topologies

Elliott Cordo - (Capsule)

Over the past few years there have been amazing advancements in technology and data architecture, complemented by recent feature releases for AWS Redshift data warehouse platform. Such features as Data Sharing, Redshift Serverless, and Redshift Spectrum have unlocked us to create flexible, performant and dynamic data platforms. In this talk we will first explore overall data platform design considerations, and common capabilities. We will then step through the different layers of the data platform from ingest, to data processing, to serving, walking through architectural patterns leveraging AWS services.We will then share an architectural walkthrough of Capsule, the Pharmacy that works foreveryone, and how we’ve set out to build a data platform that works for everyone. We will share the "Why" behind our data platform, as well as the technical details on the platform capabilities, technologies and architecture that enabled this.

What is Graph Intelligence

Leo Meyerovich - Graphistry

Graph intelligence is going through a once-in-a-generation watershed moment. From fraud detection to supply chain analysis to user analytics, graph neural networks (GNNs) are replacing previously popular techniques due to their significant lift in results. This talk shares our studies of how modern graph intelligence has been transitioning from AI research to industry, and the rapidly mounting implications we found for data scientists, data engineers, and leaders. On the theoretical side, we overview the key concepts that have grown GNNs from an academic niche to state-of-the-art for long-standing scientific challenges, and have grown GNNs far past use cases having to do with the traditional graph database market. For practitioners, we first map the emerging use cases we’re finding in top companies. One key disrupter in their graph data projects has been the modern data stack: it drastically improves core areas like implementation time, and by doing so, explains a rethinking of the graph stack to support graph intelligence. Finally, as graph intelligence is beginning its democratization phase, we discuss what is happening in key areas like making graph AI automatic, explainable, and visual.

Graph Powered Machine Learning

Jörg Schad - ArangoDB

Many powerful Machine Learning algorithms are based on graphs, e.g., Page Rank (Pregel), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. Also, the recent developments with Graph Neural Networks connect the worlds of Graphs and Machine Learning even further.Considering data pre-processing and feature engineering which are both vital tasks in Machine Learning Pipelines extends this relationship across the entire ecosystem. In this session, we will investigate the entire range of Graphs and Machine Learning with many practical exercises.

What's up with upper ontologies?

Boroslav Iordanov - Semantic Arts

Upper ontologies are domain agnostic, highly abstract models of the world that offer a starting point for building knowledge graphs, and in particular enterprise knowledge graphs (EKGs). We will look at several well-established upper ontologies, their history, underlying philosophy and applicability in an enterprise context. Not unlike classical software application frameworks, upper ontologies try to capture commonality and offer some modeling patterns that capture best practices. As such, they facilitate modularity and reuse when building large knowledge graphs, spanning multiple domains. In addition, upper ontologies make it possible for users of the EKGs to operate at different levels of abstraction when “talking” to the graph, a highly desirable feature in a complex system that only a well crafted semantic model can provide. While most upper ontologies aim to achieve semantic interoperability at internet scale, motivated by the original semantic web vision, some are purely academic in nature and others deeply rooted in the enterprise world. The Gist upper ontology, initiated Semantic Arts is one such ontology which grew iteratively by answering practical enterprise needs and it will get a special treatment. Hopefully, at the end of this talk you will be in a better position to answer questions like “should I use an upper ontology?”, “which one?” or “should I create my own?”

Transpilers Gone Wild - announcing Hydra

Josh Shinavier - (LinkedIn)

If you have ever built an enterprise knowledge graph, you know that heterogeneity comes at a cost. The more complex the interfaces to the graph become – more domain data models, more data representation languages and data exchange formats, more programming languages in which applications and ETL code are written – the more time is spent on mappings, and the harder it becomes to keep these mappings in a consistent state. At the same time, support for heterogeneity is often what motivates us to build a graph in the first place. In a previous Data Day talk, A Graph is a Graph is a Graph, I talked about a generic approach for reconciling graph and non-graph data models. The approach was later formalized as Algebraic Property Graphs and implemented in a proprietary tool which I was ultimately not permitted to release as open source software. This time around, I would like to introduce you to a new, open-source project called Hydra which expands the scope of the problem from defining composable transformations for data and schemas, to also porting those transformations between concrete programming languages, encapsulating them in developer-friendly DSLs. Learn to love typed lambda calculi, and see how weird and wonderful things get when a transformation library starts transforming itself.

Shortcut MLOps with In-Database Machine Learning

Paige Roberts - Vertica

MLOps has rocketed to prominence based on one, clear problem: many machine learning projects never make it into production. According to a recent survey by Algorithmia, 35% of even the small percentage of projects that do make it take between one month and a year to get put to work. Since data science is a cost center for organizations until those models are deployed, the need to shorten, organize, and streamline the process from ideation to production is essential.
Data science is not simply for the broadening of human knowledge, data science teams get paid to find ways to shave costs and boost revenues. That can mean preventative maintenance that keeps machines on line, churn reduction, customer experience improvements, targeted marketing that earns and keeps good customers, fraud prevention or cybersecurity that keeps assets safe and prevents loss, or AIOps that optimizes IT to get maximum capabilities for minimum costs.
To get those benefits, do you need to add yet another piece of technology to already bloated stacks? There may be a way for organizations to get machine learning into production faster with something nearly every company already has: a good analytics database.
Learn how to:
• Enable data science teams to use their preferred tools – Python, R, Jupyter – on multi-terabyte data sets
• Provide dozens of data types and formats at high scale to data science teams, without duplicating data pipeline efforts
• Make new machine learning projects just as straightforward as enabling BI teams to create a new dashboard
• Get machine learning projects from finished model to production money-maker in minutes, not months

What is a metadata platform and why do you need it?

Shirshanka Das - Acryl Data

90 minute workshop
Ontology for Data Scientists

Michael Uschold - Semantic Arts

We start with an interactive discussion to identify what are the main things that data scientists do and why and what some key challenges are. We give a brief overview of ontology and semantic technology with the goal of identifying how and where it may be useful for data scientists.
The main part of the tutorial is to give a deeper understanding of what an ontologies are and how they are used. This technology grew out of core AI research in the 70s and 80s. It was formalized and standardized in the 00s by the W3C under the rubric of the Semantic Web. We introduce the following foundational concepts for building an ontology in OWL, the W3C standard language for representing ontologies.
- Individual things are OWL individuals - e.g., JaneDoe
- Kinds of things are OWL classes - e.g., Organization
- Kinds of relationships are OWL properties - e.g., worksFor
Through interactive sessions, participants will identify what the key things are in everyday subjects and how they are related to each other. We will start to build an ontology in healthcare, using this as a driver to introduce key OWL constructs that we use to describe the meaning of data. Key topics and learning points will be:
- An ontology is a model of subject matter that you care about, represented as triples.
- Populating the ontology as triples using TARQL, R2RML and SHACL
- The ontology is used as a schema that gives data meaning.
- Building a semantic application using SPARQL.
We close the loop by again considering how ontology and semantic technology can help data scientists, and what next steps they may wish to take to learn more.

Ever wonder what is the secret behind the legendary data-driven cultures of companies like LinkedIn, Airbnb and others? The answer is metadata!
In this session, Shirshanka Das presents the multiple ways in which metadata platforms shaped the data ecosystems at these companies, through his own journey in creating the open-source DataHub project. This project is now in wide use at companies like LinkedIn, Expedia, Saxo Bank, Peloton, Klarna, Wolt and many others and is part of the critical day to day workflows for thousands of data professionals around the world.
We will deep dive into the product and architecture of DataHub, and discuss the foundations of a modern metadata platform, which includes capabilities like streaming event processing, schema-first extensible modeling, time-series metadata and graph storage. These features allow multiple use-cases like discovery, observability, automated governance to be implemented without having to rebuild the metadata storage, indexing and retrieval layer multiple times.
The talk will conclude with a short demonstration of how you can get started with DataHub and accomplish interesting things within 10 minutes or less.

Fighting COVID-19 using Knowledge Graphs

Ying Ding - University of Texas

COVID-19 pandemic has caused the tragic loss of human lives and the worst economic downturn since the Great Depression. Scientific discovery heavily depends on the accumulated knowledge by peer scientists. There are over 500,000 COVID-19 related articles published, but nobody can read them all. This talk highlights the importance of knowledge graph and how it can help scientists and the public to figure against COVID-19. It covers the topics ranging from building COVID-19 knowledge graph based on scientific publications to faciliate drug discovery, using contrastive deep learning to enhance the risk prediction for COVID patients, to investigating the parachuting collaboration and research novelty in COVID-19 related research activities.

Business Transformers - Leveraging Transfer Learning for B2B Insights

Paul Azunre - Dun and Bradstreet

Pretrained transformer-based neural network models, such as BERT, GPT-3 & XLM have rapidly advanced the state of natural language processing (NLP). Instead of training models from scratch, practitioners can now download general pretrained models and quickly adapt them to new scenarios and tasks with relatively little training data. This has led to rapid advances in various applications, from medicine to natural language generation and chatbots. Paul will overview related recent advances and concepts, and discuss some ways in which these tools can be applied to extract B2B Insights – from detecting and measuring the various stages of the customer buying journey to extending any such analysis capability from English to multiple other languages.

For the overwhelmed data professionals: What to do when there is so much to do?

Yue Cathy Chang / Jike Chong

95% of the companies with data science teams have teams of fewer than ten members. In a nimble team with a wide range of possibilities to make business impacts with data science, how do you explore and prioritize the opportunities? How do you ensure that there are sponsors and champions for your project? How do you set realistic expectations with business partners for project success? If you are leading a team, how can you delegate your projects effectively?
In this talk, the presenters will share three techniques for project prioritization, two roles (sponsor and champions) to identify, four levels of confidence (for predictive models) to specify project success, and discuss best practices for delegating work as a team lead/manager.

A gentle introduction to using graph neural networks on knowledge graphs

Dave Bechberger - Amazon Web Services

Knowledge graphs are a hot topic in the enterprise data space but they often suffer from a lack of rich connections within the data. Graph neural networks (GNNs) can help fill this gap by using the structure of connections within the data to predict new connections. These two seem like a natural partnership, however understanding what each of these is and how to leverage them together is a challenge.
In this talk we’ll provide a gentle introduction to the concepts of knowledge graphs and graph neural networks. We will discuss what a knowledge graph is, why you might want to use one, and some of the challenges you will face. We will then discuss key concepts of graph-based machine learning including an introduction to how they work, what use cases they solve, and how they can be leveraged within knowledge graphs. Finally, we’ll walk through a few demonstrations to show the power of leveraging knowledge graphs and GNN’s together

What Happens Next? Event Predictions with Machine Learning and Graph Neural Networks

Jans Aasman - Franz Inc.

Enterprises are subscribed to the power of modeling data as a graph and the importance of using Knowledge Graphs for customer 360 and beyond. The ability to explain the results of AI models, and produce consistent results from them, involves modeling real-world events with the adaptive schema consistently provided via Knowledge Graphs.
Probably the most important reason for building Knowledge Graphs has been to answer the age old question: “What is going to happen next?” Given the data, relationships, and timelines we know about a customer, patient, product, etc. (“The Entity of Interest”), how can we confidently predict the most likely next event.
For example, in healthcare, what is the outcome for this patient given the sequence of previous diseases, medications, and procedures. For manufacturers, what is going to require repair next in this aircraft or some other point in the supply chain.
Machine Learning and more recently, Graph Neural Networks (GNNs) have emerged as a mature AI approach used by companies for Knowledge Graph enrichment. GNNs enhance neural network methods by processing graph data through rounds of message passing, as such, the nodes know more about their own features as well as neighbor nodes. This creates an even more accurate representation of the entire graph network.
In this presentation we describe how to use graph embeddings and regular recurrent neural networks to predict events via Graph Neural Networks. We will also demonstrate creating a GNN in the context of a Knowledge Graph for building event predictions

Introduction to Taxonomies for Data Scientists (tutorial)

Heather Hedden - Semantic Web Company

This tutorial/workshop teaches the fundamentals and best practices for creating quality taxonomies, whether for the enterprise or for specific knowledge bases in any industry. Emphasis is on serving users rather than on theory. Topics to be covered include: the appropriateness of different kinds of knowledge organization systems (taxonomies, thesauri, ontologies, etc.), standards, taxonomy concept creation and labeling, taxonomy relationship creation. The connection of taxonomies to ontologies and knowledge graphs will also be briefly discussed. There will be some interactive activities and hands-on exercises. This session will cover:
Introduction to taxonomies and their relevance to data
• Comparisons of taxonomies and knowledge organization system types
• Standards for taxonomies and knowledge organization systems
• Taxonomy concept creation
• Preferred and alternative label creation
• Taxonomy relationship creation
• Taxonomy relationships to ontologies and knowledge graphs
• Best practices and taxonomy management software use

The Future of Taxonomies - Linking data to knowledge

Heather Hedden - Semantic Web Company

Taxonomies are no longer just for navigation or tagging documents. AI technologies, such as machine learning and NLP are now being used more in combination with taxonomies rather than merely in place of taxonomies. The results include applications of semantic search, personalization, recommendation, question-answering. Combining taxonomies with ontologies and linked instance data, supporting more powerful search and analytics across all kinds of data, structured and unstructured, and not just documents. Topics to be discussed include:
- Trends in uses of taxonomies (industries, applications, implementation and management), benefits of taxonomies,
- How knowledge-based recommendation and personalization systems are built
- How NLP and taxonomies support each other
- How taxonomies contribute to knowledge graphs
- Semantic Web standards and their benefits for taxonomies and other knowledge organization systems

Visual timeline analytics: applying concepts from graph theory to timeline and time series data

Corey Lanum - Cambridge Intelligence

Timelines are one of the most basic forms of data visualization and plotting events on a timeline is usually a straightforward process. But what happens when we have connections in that data? Time-based data often takes the properties of both time series and graph data sets, but the traditional visualizations for each are lacking. Time series visualizations like a stock ticker chart can show how a variable changes over time, but not how data elements are connected to one another. And node-link charts can show relationships but not how they may be changing over time. This talk introduces the new concept of visual timeline analysis, a kind of visualization designed to detangle complex time-based connected data. Corey will share his experience helping organizations harness visual timelines in their applications taking inspiration from the graph world. He’ll show some examples of how a visualization timeline can show connected data over time and how looking at the data this way can produce some really interesting insights.

Unstructured Data Management: It's not just for your documents

Kirk Marple - Unstruk

For structured data, the ‘Modern Data Stack’ offers solutions for ETL, Aggregation, Visualization and Analytics. For unstructured data (i.e. images, audio, video, 3D, documents), there is not a complementary ‘Modern Unstructured Data Stack’. Disparate solutions exist for geospatial data, document management, and visual intelligence - but none provide a holistic view across unstructured data types, correlate unstructured data to structured data, nor describe relationships between real-world assets and unstructured data.In this talk, I will discuss the concept of a data platform for all your unstructured data - not just documents - and how ETL, data modeling, visualization and analytics differ for unstructured data.

Using Reproducible Experiments To Create Better Models

Milecia McGregor - Iterative

It's easy to lose track of which changes gave you the best result when you start exploring multiple model architectures. Tracking the changes in your hyperparameter values, along with code and data changes, will help you build a more efficient model by giving you an exact reproduction of the conditions that made the model better.
In this talk, you will learn how you can use the open-source tool, DVC, to increase reproducibility for two methods of tuning hyperparameters: grid search and random search. We'll go through a live demo of setting up and running grid search and random search experiments. By the end of the talk, you'll know how to add reproducibility to your existing projects.

What is Truth? - Strategies for managing semantic triples in large complex systems

Ryan Mitchell - Gerson Lehrman Group

Knowledge graphs and their ontology-based database kin have exploded in popularity in recent years. Through cheap data sources and powerful inference engines, they can grow extremely quickly. But when bad data gets in, their judgments can be poisoned just as fast. We will discuss strategies for managing semantic triples and assessing their “truthiness” in large complex systems with many sources of data, machine learning algorithms, and fact validation methods.

A Path to Strong AI

Jonathan Mugan - DeUmbra

We need strong artificial intelligence (AI) so it can help us understand the nature of the universe to satiate our curiosity, devise cures for diseases to ease our suffering, and expand to other star systems to ensure our survival. To do all this, AI must be able to learn condensed representations of the environment in the form of models that enable it to recognize entities, infer missing information, and predict events. And because the universe is of almost infinite complexity, AI must be able to compose these models dynamically to generate combinatorial representations to match this complexity. This talk will cover why models are needed for robust intelligence and why an intelligent agent must be able to dynamically compose those models. It will also cover the current state-of-the-art in AI model building, discussing both the strengths and weaknesses of neural networks and probabilistic programming. The talk will cover how we can train an AI to build models that will enable it to be robust, and it will conclude with how we can effectively evaluate AI using technology from computer-animated movies and videogames.
This talk is based on an article Jonathan will be publishing in The Gradient

History of Network Science - A Look at How Networks Have Connected Us

Sean Robinson - Graphable

The rise of graph analytics and network science within data has been steadily growing for several years. Today we see top companies offering new ways to leverage the interconnected power of networks to answer the increasingly complex and subtle questions businesses are asking of their data. However, the use of networks to model data is a technique which has grown over hundreds of years.This talk takes viewers through a timeline of network science as we explore how people have taken on novel challenges in the real world and how they used networks as a solution to these problems. Beginning in 1736 with the Seven Bridges of Königsberg and through to today’s innovations in network science, by exploring the challenges which drove early innovators to rely on the power of networks, we can both understand the groundwork that led to the current innovations we see arising today and gain insight into how these innovations can shape our own solutions to everyday data problems.

Outrageous ideas for Graph Databases

Max De Marzi - Amazon Web Services

Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.

This Dashboard Should Have Been a Meeting

Michael Zelenetz - PEAK6

How many of times has your team spent weeks or months developing a dashboard that ultimately serves almost no users? This talk will present a guideline for making your data products successful--what should go in a dashboard and what should not, how to design your dashboard for maximum impact, and, most importantly, when you should not start building a dashboard in the first place.

Thinking outside of the Euclidean Space: An Introduction study to Graph Machine Learning and its Applications with Hands-on-Experience.

Sachin Sharma - ArangoDB

So far we have read a lot of articles regarding the Convolutional Neural Networks which is a well known method to handle euclidean data structures (like images, text, etc.). However in the real world, we are also surrounded by the non-euclidean data structure like graphs and a machine learning method to handle this type of data domain is known as the Graph Neural Networks.Therefore in this session we will first deep dive into the concepts of Graph ML (Part-1) and during the Hands-on-Session (Part-2) we will build a complete Graph ML application starting from training a GraphSage model till deploying your first Graph ML model on the Nvidia's Triton Inferencer server. This will be followed by updating amazon product graph (stored in ArangoDB) with predicted node embeddings and then making product recommendation with ArangoDB's AQL query.