Workshops at Data Day Texas 2022

The mini-workshops listed below are 90 minutes - the same length as two regular Data Day sessions. These workshops run throughout the day, and are held in tiered classrooms with space to open and plug-in your laptop. The goal of each workshop is to set you up with a new tool/skill and enough knowledge to continue on your own.

Introduction to Codeless Deep Learning

90 minute hands-on tutorial
Satoru Hayasaka - (KNIME)

Deep learning is used successfully in many data science applications, such as image processing, text processing, and fraud detection. KNIME offers an integration to the Keras libraries for deep learning, combining the codeless ease of use of KNIME Analytics Platform with the extensive coverage of deep learning paradigms by the Keras libraries. Though codeless, implementing deep learning networks still requires orientation to distinguish between the learning paradigms, the feedforward multilayer architectures, the networks for sequential data, the encoding required for text data, the convolutional layers for image data, and so on. This course will provide you with the basic orientation in the world of deep learning and the skills to assemble, train, and apply different deep learning modules.

RLlib for Deep Hierarchical Multiagent Reinforcement Learning

90 minute hands-on tutorial
Jonathan Mugan - (DeUmbra)

Reinforcement learning (RL) is an effective method for solving problems that require agents to learn the best way to act in complex environments. RLlib is a powerful tool for applying reinforcement learning to problems where there are multiple agents or when agents must take on multiple roles. There exist many resources for learning about RLlib from a theoretical or academic perspective, but there is a lack of materials for learning how to use RLlib to solve your own practical problems. This tutorial helps to fill that gap. We show you how to apply reinforcement learning to your problem by taking you through a custom environment and demonstrating how to apply RLlib to that environment. We make the code available on GitHub.
Requirements: ability to code in Python.

Ontology for Data Scientists - 90 minute tutorial

Michael Uschold - (Semantic Arts)

We start with an interactive discussion to identify what are the main things that data scientists do and why and what some key challenges are. We give a brief overview of ontology and semantic technology with the goal of identifying how and where it may be useful for data scientists.
The main part of the tutorial is to give a deeper understanding of what an ontologies are and how they are used. This technology grew out of core AI research in the 70s and 80s. It was formalized and standardized in the 00s by the W3C under the rubric of the Semantic Web. We introduce the following foundational concepts for building an ontology in OWL, the W3C standard language for representing ontologies.
- Individual things are OWL individuals - e.g., JaneDoe
- Kinds of things are OWL classes - e.g., Organization
- Kinds of relationships are OWL properties - e.g., worksFor
Through interactive sessions, participants will identify what the key things are in everyday subjects and how they are related to each other. We will start to build an ontology in healthcare, using this as a driver to introduce key OWL constructs that we use to describe the meaning of data. Key topics and learning points will be:
- An ontology is a model of subject matter that you care about, represented as triples.
- Populating the ontology as triples using TARQL, R2RML and SHACL
- The ontology is used as a schema that gives data meaning.
- Building a semantic application using SPARQL.
We close the loop by again considering how ontology and semantic technology can help data scientists, and what next steps they may wish to take to learn more.

Introduction to Taxonomies for Data Scientists - 90 minute tutorial

Heather Hedden - (Semantic Web Company)

This tutorial/workshop teaches the fundamentals and best practices for creating quality taxonomies, whether for the enterprise or for specific knowledge bases in any industry. Emphasis is on serving users rather than on theory. Topics to be covered include: the appropriateness of different kinds of knowledge organization systems (taxonomies, thesauri, ontologies, etc.), standards, taxonomy concept creation and labeling, taxonomy relationship creation. The connection of taxonomies to ontologies and knowledge graphs will also be briefly discussed. There will be some interactive activities and hands-on exercises. This session will cover:
Introduction to taxonomies and their relevance to data
• Comparisons of taxonomies and knowledge organization system types
• Standards for taxonomies and knowledge organization systems
• Taxonomy concept creation
• Preferred and alternative label creation
• Taxonomy relationship creation
• Taxonomy relationships to ontologies and knowledge graphs
• Best practices and taxonomy management software use

A 90 minute hands-on GNN Primer

Everything you wanted to know about Graph AI (but were too afraid to ask)
Alexander Morrise - (Graphistry)

The utility of Graph Neural Nets (GNNs) is growing day by day. By bringing the power of graph thinking to the already-amazing world of neural networks, GNNs are achieving better results than graph analytics or neural nets have on their own. In some cases, GNNs are even enabling answering new kinds of questions about nodes and edges. Our workshop walks through how you can approach automatically turning your typical datasets into GNNs ready for a variety of useful visual and analytic insights. Graph AI thinking topics will include matching different tasks to different GNNs, how to choose node and edge feature representations, and top examples of applying graph thinking to domains like social & behavioral data, event data, natural language, supply chain, and more. For technologies, we will focus on the popular path of using the open source PyData GPU ecosystem and modern data stack, including PyTorch, DGL, RAPIDS (cuGraph/cuDF), Arrow, and PyGraphistry[AI] from Jupyter/Streamlit. We will emphasize workflows that automate handling large and heterogeneous data. At the end, attendees should feel ready to, in just 1-2 lines of code, go from any data source like CSVs, Databricks, SQL, logs, and graph databases to powerful graph AI visualizations and models.

Graph Neural Networks with PyTorch Geometric, Nvidia Triton, and ArangoDB.

Sachin Sharma - ArangoDB

So far we have read a lot of articles regarding the Convolutional Neural Networks which is a well-known method to handle euclidean data structures (like images, text, etc.). However, in the real world, we are also surrounded by the non-euclidean data structure like graphs and a machine learning method to handle this type of data domain is known as the Graph Neural Networks. Therefore in this workshop, we will first deep dive into the concepts of Graph Neural Networks and their Applications (Part-1) and then during the Hands-on-Session (Part-2) we will build a complete Graph Neural Network application (with Pytorch Geometric) starting from training a GraphSage model till deploying your first Graph Neural Network model on the Nvidia's Triton Inference server. This will be followed by updating Amazon product graph with predicted node embeddings and then making product recommendations with ArangoDB's AQL query.