Workshops at Data Day Texas 2023

The mini-workshops are 90 minutes - the same length as two regular Data Day sessions. These workshops run throughout the day, and are held in tiered classrooms with space to open and plug-in your laptop. The goal of each workshop is to set you up with a new tool/skill and enough knowledge to continue on your own.

Introduction to Taxonomies for Data Scientists

Heather Hedden - (Semantic Web Company)

This 90 minute tutorial - with an optional 40 minute hands-on session - teaches the fundamentals and best practices for using and creating quality taxonomies, whether for the enterprise or for specific knowledge bases in any industry. Emphasis is on serving users rather than on theory. Topics to be covered include: the appropriateness of different kinds of knowledge organization systems (taxonomies, thesauri, ontologies, etc.), standards, taxonomy concept creation and labeling, taxonomy relationship creation. The connection of taxonomies to ontologies and knowledge graphs will also be discussed. This session will cover:
• Introduction to taxonomies and their relevance to data
• Comparisons of taxonomies and knowledge organization system types
• Standards for taxonomies and knowledge organization systems
• Taxonomy concept sources and creation
• Wording of concept labels
• Taxonomy concept relationships
• Semantically enriching a taxonomy to extend it to become an ontology

Following the 90-minute tutorial will be an optional additional 40-minute session for a deeper dive into taxonomies, which also includes hands-on exercises. The deeper dive topics include:
• Creating alternative labels
• Creating hierarchical relationships
• Taxonomy linking and mapping
• Taxonomy governance and quality
• Taxonomy management software demo

Escaping Excel Hell - Break Free From Old Patterns with KNIME

Scott Fincher / Dashiell Brookhart - (KNIME)

Spreadsheets have been a part of the data landscape for decades now, and while Excel is a familiar tool to many, it can often create as many problems as it solves. With KNIME Analytics Platform, you can transition many common data processing tasks from a spreadsheet to a visual framework, making your data journey easier to understand for others, simpler to automate, and even a bit of fun! In this workshop we will build some workflows together that demonstrate common Excel tasks in action, like reading/writing data, filtering and lookups, and grouping and pivoting tables. You’ll also learn how to add interactive KNIME visualizations to the mix, and we will even practice building and optimizing predictive models using a no-code approach. No previous KNIME experience is required - just download the free, open source KNIME Analytics Platform (knime.com/downloads) beforehand and we'll provide the rest. Come join us to learn how you can escape “Excel hell”, and save both you and your team time and stress!

Introduction to Graph Data Science for Python Developers

Sean Robinson - (Graphable)

This workshop will cover a variety of graph data science techniques using Python, Neo4j, and other libraries. The goal of the workshop is to serve as a springboard for attendees to identify which graph-based tools/techniques can provide novel value to existing workflows. Some of the techniques to be covered are:
• How to think about data as a graph and the implications that has on downstream analysis
• How to use graph algorithms at scale using both Neo4j and other pythonic libraries
• How to enhance traditional ML models with graph embeddings
• How to visualize these insights in the context of a graph for greater business intelligence
• How to integrate these techniques with your existing data science tool belt

Hands-On Introduction To GraphQL For Data Scientists & Developers

William Lyon - (Neo4j)

This hands-on workshop will introduce GraphQL and explore how to build GraphQL APIs backed by Neo4j, a native graph database, and show why GraphQL is relevant for both developers and data scientists. This workshop will show how to use the Neo4j GraphQL Library, which allows developers to quickly design and implement fully functional GraphQL APIs without writing boilerplate code, to build a Node.js GraphQL API, including adding custom logic, authorization rules, and operationalizing data science techniques.

Outline
- Overview of GraphQL and building GraphQL APIs
- Building Node.js GraphQL APIs backed by a native graph database using the Neo4j GraphQL Library
- Adding custom logic to our GraphQL API using the @cypher schema directive and custom resolvers
- Adding authentication and authorization rules to our GraphQL API

Prerequisites
We will be using online hosted environments so no local development setup is required. Specifically, we will use Neo4j Aura database-as-a-service and CodeSandbox for running our GraphQL API application. Prior to the workshop please register for Neo4j Aura and create a "Free Tier" database: dev.neo4j.com/neo4j-aura. You will also need a GitHub account to sign-in to CodeSandbox or create a CodeSandbox account at codesandbox.io.

Ontology for Data Scientists

Michael Uschold - (Semantic Arts)

We start with an interactive discussion to identify what are the main things that data scientists do and why and what some key challenges are. We give a brief overview of ontology and semantic technology with the goal of identifying how and where it may be useful for data scientists.
The main part of the tutorial is to give a deeper understanding of what an ontologies are and how they are used. This technology grew out of core AI research in the 70s and 80s. It was formalized and standardized in the 00s by the W3C under the rubric of the Semantic Web. We introduce the following foundational concepts for building an ontology in OWL, the W3C standard language for representing ontologies.
- Individual things are OWL individuals - e.g., JaneDoe
- Kinds of things are OWL classes - e.g., Organization
- Kinds of relationships are OWL properties - e.g., worksFor
Through interactive sessions, participants will identify what the key things are in everyday subjects and how they are related to each other. We will start to build an ontology in healthcare, using this as a driver to introduce key OWL constructs that we use to describe the meaning of data. Key topics and learning points will be:
- An ontology is a model of subject matter that you care about, represented as triples.
- Populating the ontology as triples using TARQL, R2RML and SHACL
- The ontology is used as a schema that gives data meaning.
- Building a semantic application using SPARQL.
We close the loop by again considering how ontology and semantic technology can help data scientists, and what next steps they may wish to take to learn more.