The Graph Day 2023 Sessions
What You Can't do With Graph Databases
Tomás Sobat Stöfsel - (Vaticle)
Developing with graph databases has a number of challenges, such as the modelling of complex schemas and maintaining data consistency in your database.In this talk, we discuss how TypeDB addresses these challenges, as well as how it compares to property graph databases. We’ll look at how to read and write data, how to model complex domains, and TypeDB’s ability to infer new data.The main differences between TypeDB and graph databases can be summarised as:1. TypeDB provides a concept-level schema with a type system that fully implements the Entity-Relationship (ER) model. Graph databases, on the other hand, use vertices and edges without integrity constraints imposed in the form of a schema2. TypeDB contains a built-in inference engine - graph databases don’t provide native inferencing capabilities3. TypeDB offers an abstraction layer over a graph, and leverages a graph engine under the hood to create a higher-level model; graph databases offer a lower level of abstraction
Visualizing Connected Data as It Evolves Over Time
Janet Six - (Tom Sawyer)
Connected data visualization and analysis techniques are becoming increasingly popular for the ability to work well with graph databases and for the communication of key results to stakeholders and decision makers. Applying these techniques to connected data that is static benefits from specific techniques, but what are the best practices for visualizing connected data that dynamically changes? And, how do you best model the changes that are occurring in the system?
In this session, we will discuss how connected data can change over time and the implications of those changes for visualization and analysis techniques. We will also explore visualization techniques for dynamically changing connected data, including social networks that evolve over time, digital transformation model simulations, and event analysis. These visualization techniques allow us to:
• Apply post-situation analysis so that we can understand what happened in the system and when
• Better understand simulations of future scenarios and compare them
• Discover important trends
Zero-copy integration
Dave McComb - (Semantic Arts)
The reason we need to talk about zero copy integration is that its opposite is so well entrenched that most practitioners can’t imagine a world without some form of extract, transform and load, or system integration copy and manipulate through APIs. The traditional enterprise data landscape in an almost endless set of pipelines, data sets and chutes and ladders that ferry data from its source to myriad destinations. This seems necessary, because each application we use and each tool we employ has its own bespoke way of structuring data. Each application ends up morphing the prior application’s idiosyncrasies into its own idiosyncrasies. In this talk we unpack the prerequisites needed to achieve Data-centricity and zero copy integration. We will present two case studies of firms that are enjoying zero copy integration. We will also present a simple demonstration, to make the idea more concrete.
Graphing without the database - creating graphs from relational databases
Corey Lanum - (Cambridge Intelligence)
Many projects I’ve worked on assume that to present business users with a node-link view means transferring all the data to a graph database. Worse still, I’ve seen teams duplicate and synchronize their data into a second database creating new layers of completely unnecessary complexity.
The truth is that analyzing and visualizing data as a graph doesn’t necessarily mean storing the data in a graph database, or in a graph format, at all.
While graph databases have value for complex traversal queries, in many cases, they create unnecessary complexity. The simpler model of translating tabular data to nodes and links on the fly is easier to implement and allows the flexibility to choose from different data models of the same source.
In this talk, I’ll walk through the process and architecture of building a graph application from the standard relational database you probably already have.
Clinical trials exploration: surfacing a clinical application from a larger Bio-Pharma KnowledgeGraph
David Hughes - Graphable
Clinical, proteomic, and pharma knowledge graphs are complex aggregations of constituent sub graphs. These linked graphs provide meaningful insights as a whole, but in many cases a singular subgraph can independently prove to be a valuable asset. In this session, David will identify possible applications of the NLM’s Clinical Trials resource as a standalone application. He will review how to query the http://clinicaltrials.gov API, how to populate/run ETL through tools like Hume orchestra and Apache-Hop. He will then explore how to create an application using Streamlit as a POC, and then discuss potential refinements.”
Protecting Against Ransomware Attacks using Kafka, Flink and Boostgraph
Brian Hall - (Qomplx)
Ransomware attacks are now commonplace in the news and only becoming more so - and they do not include those handled quietly. Come see how Kafka, Flink and graph technologies like boostgraph can be used to identify anomalous behaviors on corporate networks.
Prerequisites: General knowledge around messaging technologies, parallelism and autoscaling.
Takeaways: How to keep track of and manage network traffic in a highly scalable manner and interpret relative risk of potential breaches using readily available technologies.
Outrageous ideas for Graph Databases
Max De Marzi - Amazon Web Services
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Enabling the Computational Future of Biology
Tomás Sobat Stöfsel - (Vaticle)
Computational biology has revolutionised biomedicine. The volume of data it is generating is growing exponentially. This requires tools that enable computational and non-computational biologists to collaborate and derive meaningful insights. However, traditional systems are inadequate to accurately model and handle data at this scale and complexity.In this talk, we discuss how TypeDB enables biologists to build a deeper understanding of life, and increase the probability of groundbreaking discoveries, across the life sciences.
Where Is the Graph? Best practices for extracting data from unstructured data sources for effective visualization and analysis
Janet Six - (Tom Sawyer)
As unstructured data becomes larger and more complex while stakeholders expect increasingly useful data results, we can apply graph techniques to discover ever elusive insights. These graph techniques can assist with data discovery and understanding, and be used to communicate findings to decision makers. But what are the best practices to apply graph technology to the connected data that is inherent in unstructured data sources? Where is the graph?Currently, many companies are still trying to visualize or analyze the whole data source. This leads to mixed results and hairball visualizations that may be beautiful artistically, but don’t show the level of detail that is needed for visual analysis and for communication of results to stakeholders. How do we get beyond all the noise in unstructured data to discover the knowledge needed to bring business value to that data?In this session, we will discuss several approaches to finding useful graphs in your unstructured data and how to apply visualization and analysis techniques to them.
Workshops and Tutorials
Introduction to Graph Data Science for Python Developers
Sean Robinson - (Graphable)
This workshop will cover a variety of graph data science techniques using Python, Neo4j, and other libraries. The goal of the workshop is to serve as a springboard for attendees to identify which graph-based tools/techniques can provide novel value to existing workflows. Some of the techniques to be covered are:
• How to think about data as a graph and the implications that has on downstream analysis
• How to use graph algorithms at scale using both Neo4j and other pythonic libraries
• How to enhance traditional ML models with graph embeddings
• How to visualize these insights in the context of a graph for greater business intelligence
• How to integrate these techniques with your existing data science tool belt
Hands-On Introduction To GraphQL For Data Scientists & Developers
William Lyon - (Neo4j)
This hands-on workshop will introduce GraphQL and explore how to build GraphQL APIs backed by Neo4j, a native graph database, and show why GraphQL is relevant for both developers and data scientists. This workshop will show how to use the Neo4j GraphQL Library, which allows developers to quickly design and implement fully functional GraphQL APIs without writing boilerplate code, to build a Node.js GraphQL API, including adding custom logic, authorization rules, and operationalizing data science techniques.
Outline
- Overview of GraphQL and building GraphQL APIs
- Building Node.js GraphQL APIs backed by a native graph database using the Neo4j GraphQL Library
- Adding custom logic to our GraphQL API using the @cypher schema directive and custom resolvers
- Adding authentication and authorization rules to our GraphQL API
Prerequisites
We will be using online hosted environments so no local development setup is required. Specifically, we will use Neo4j Aura database-as-a-service and CodeSandbox for running our GraphQL API application. Prior to the workshop please register for Neo4j Aura and create a "Free Tier" database: dev.neo4j.com/neo4j-aura. You will also need a GitHub account to sign-in to CodeSandbox or create a CodeSandbox account at codesandbox.io.
Ontology for Data Scientists - 90 minute tutorial
Michael Uschold - (Semantic Arts)
We start with an interactive discussion to identify what are the main things that data scientists do and why and what some key challenges are. We give a brief overview of ontology and semantic technology with the goal of identifying how and where it may be useful for data scientists.
The main part of the tutorial is to give a deeper understanding of what an ontologies are and how they are used. This technology grew out of core AI research in the 70s and 80s. It was formalized and standardized in the 00s by the W3C under the rubric of the Semantic Web. We introduce the following foundational concepts for building an ontology in OWL, the W3C standard language for representing ontologies.
- Individual things are OWL individuals - e.g., JaneDoe
- Kinds of things are OWL classes - e.g., Organization
- Kinds of relationships are OWL properties - e.g., worksFor
Through interactive sessions, participants will identify what the key things are in everyday subjects and how they are related to each other. We will start to build an ontology in healthcare, using this as a driver to introduce key OWL constructs that we use to describe the meaning of data. Key topics and learning points will be:
- An ontology is a model of subject matter that you care about, represented as triples.
- Populating the ontology as triples using TARQL, R2RML and SHACL
- The ontology is used as a schema that gives data meaning.
- Building a semantic application using SPARQL.
We close the loop by again considering how ontology and semantic technology can help data scientists, and what next steps they may wish to take to learn more.
Introduction to Taxonomies for Data Scientists - 90 minute tutorial
Heather Hedden - (Semantic Web Company)
This tutorial/workshop teaches the fundamentals and best practices for creating quality taxonomies, whether for the enterprise or for specific knowledge bases in any industry. Emphasis is on serving users rather than on theory. Topics to be covered include: the appropriateness of different kinds of knowledge organization systems (taxonomies, thesauri, ontologies, etc.), standards, taxonomy concept creation and labeling, taxonomy relationship creation. The connection of taxonomies to ontologies and knowledge graphs will also be briefly discussed. There will be some interactive activities and hands-on exercises. This session will cover:
Introduction to taxonomies and their relevance to data
• Comparisons of taxonomies and knowledge organization system types
• Standards for taxonomies and knowledge organization systems
• Taxonomy concept creation
• Preferred and alternative label creation
• Taxonomy relationship creation
• Taxonomy relationships to ontologies and knowledge graphs
• Best practices and taxonomy management software use