Confirmed Graph Sessions
The conference hotel is now sold out, but rooms are still available for Thursday, Friday and Saturday night at the nearby Hampton Inn (Hilton) - just one block away from the conference. If they should sell out, reasonably priced rooms are still available at the DoubleTree Hilton on 15th St. - just three blocks away.
We are now just beginning to announce the sessions. Please check back for updates. If you would like to speak at the Global Graph Summit, visit our proposals page for details.
TinkerPop Keynote
TinkerPop 2020
Joshua Shinavier - Uber
What is now Apache TinkerPop began in late 2009 as a collection of developer tools that intermingled graph structure and processing. The framework quickly evolved into a unifying data model and query paradigm for the emerging family of property graph databases, contributing significantly to the rise of graphs in industry and in the open source community. Ten years later, graphs are truly everywhere; there are dozens of graph systems implementing TinkerPop, and the labeled property graph data model is playing a major role in enterprise knowledge graphs and data integration efforts of all kinds. While TinkerPop has been hugely popular with developers, however, it is likely that graphs have only tapped a fraction of their potential. What will take this community “to eleven” are abstractions for structure and process that are as powerful from a formal and computational point of view as they are compelling to the human developer or end user. In this talk, we will take a brief look back in time at TinkerPop versions 1, 2, and 3 before reviewing the current state of the art and setting the stage for TinkerPop 4. Looking ahead to the next year or so, we are prioritizing strong schemas, strongly-typed traversals, and functional encapsulation of side-effects, exceptions, and transactions that will make data and process in TinkerPop far more portable across language environments and enable new-to-graph capabilities like automated data migration and query translation, as well as various new forms of query optimization. As TinkerPop transcends the JVM, we will rely to a greater extent on composable mappings and code generation to propagate data structures and logic into places no graph has gone before. Done right, we think graphs may become as ubiquitous as the relational model, but so much more interesting and so much more similar to the way we humans naturally structure our world. Of course, the best way to predict the future is to make it happen.
Database Keynote
mm-ADT A Multi-Model Abstract Data Type
Marko Rodriguez - RRedux
mm-ADT™ is a distributed virtual machine capable of integrating a diverse collection of data processing technologies. This is made possible via three language-agnostic interfaces: language, process, and storage. When a technology implements a respective interface, the technology is considered mm-ADT compliant and is able to communicate with any other compliant technologies via the virtual machine. In this manner, query language developers can develop languages irrespective of the underlying storage system being manipulated. Processing engines can be programmed by any query language and executed over any storage system. Finally, data storage systems automatically support all mm-ADT compliant query languages and processors.
Follow mm-ADT™ on GitHub, Twitter, StackExchange, or join the Slack Channel.
Knowledge Graph Keynote
A Brief History of Knowledge Graph's Main Ideas
Juan Sequeda - data.world
Knowledge Graphs can be considered to be fulfilling an early vision in Computer Science of creating intelligent systems that integrate knowledge and data at large scale. The term “Knowledge Graph” has rapidly gained popularity in academia and industry since Google popularized it in 2012. It is paramount to note that, regardless of the discussions on, and definitions of the term “Knowledge Graph”, it stems from scientific advancements in diverse research areas such as Semantic Web, Databases, Knowledge Representation and Reasoning, NLP, Machine Learning, among others.
The integration of ideas and techniques from such disparate disciplines give the richness to the notion of Knowledge Graph, but at the same time presents a challenge to practitioners and researchers to know how current advances develop from, and are rooted in, early techniques.
In this talk, Juan will provide a historical context on the roots of Knowledge Graphs grounded in the advancements of the computer science disciplines of Knowledge, Data and the combination thereof, starting from the 1950s.
90 minute workshop
Graph Feature Engineering for More Accurate Machine Learning
Amy Hodler / Justin Fine - Neo4j
Graph enhanced AI and ML are changing the landscape of intelligent applications. In this workshop, we’ll focus on using graph feature engineering to improve the accuracy, precision, and recall of machine learning models. You’ll learn how graph algorithms can provide more predictive features. We’ll illustrate a link prediction workflow using Spark and Neo4j to predict collaboration and discuss how you avoid missteps and tips to get measurable improvements.
90 minute workshop
Ontology for Data Scientists
Michael Uschold - Semantic Arts
We start with an interactive discussion to identify what are the main things that data scientists do and why and what some key challenges are. We give a brief overview of ontology and semantic technology with the goal of identifying how and where it may be useful for data scientists.
The main part of the tutorial is to give a deeper understanding of what an ontologies are and how they are used. This technology grew out of core AI research in the 70s and 80s. It was formalized and standardized in the 00s by the W3C under the rubric of the Semantic Web. We introduce the following foundational concepts for building an ontology in OWL, the W3C standard language for representing ontologies.
- Individual things are OWL individuals - e.g., JaneDoe
- Kinds of things are OWL classes - e.g., Organization
- Kinds of relationships are OWL properties - e.g., worksFor
Through interactive sessions, participants will identify what the key things are in everyday subjects and how they are related to each other. We will start to build an ontology in healthcare, using this as a driver to introduce key OWL constructs that we use to describe the meaning of data. Key topics and learning points will be:
- An ontology is a model of subject matter that you care about, represented as triples.
- Populating the ontology as triples using TARQL, R2RML and SHACL
- The ontology is used as a schema that gives data meaning.
- Building a semantic application using SPARQL.
We close the loop by again considering how ontology and semantic technology can help data scientists, and what next steps they may wish to take to learn more.
Modeling, Querying, and Seeing Time Series Data within a Self-Organizing Mesh Network
Denise Gosnell - DataStax
Self-organizing networks rely on sensor communication and a centralized mechanism, like a cell tower, for transmitting the network's status.
So, what happens if the tower goes down? And, how does a graph data structure get involved in the network's healing process?
In this session, Dr. Gosnell will show you how we see graphs in this dynamic network and how path information helps sensors come back online. She will walk through the data, model, and Gremlin queries which help a power company have real-time visibility into different failure scenarios.
Building a Graph User-Interface for Malware-Analysis
Stefan Hausotte - G DATA / Ethan Hasson - Expero / Kristin Stone - Expero
As a security company, G DATA built a large JanusGraph database with information about different malware threats over the years. In this talk, the presenters will show how they built a web-interface to explore the data. This allows malware analysts to get insights about threats without the need to query the database manually and helps to get an understanding of the connections between malware through an appealing visualization. They will also discuss how they build a GraphQL API for the JanusGraph instance and how the user interface was built with open-source JavaScript libraries.
Responsible AI Requires Context and Connections
Amy Hodler - Neo4j
As creators and users of artificial intelligence (AI), we have a duty to guide the development and application of AI in ways that fit our social values, in particular, to increase accountability, fairness and public trust. AI systems require context and connections to have more responsible outcomes and make decisions similar to the way humans do.
AI today is effective for specific, well-defined tasks but struggles with ambiguity which can lead to subpar or even disastrous results. Humans deal with ambiguities by using context to figure out what’s important in a situation and then also extend that learning to understanding new situations. In this talk, Amy Hodler will cover how artificial intelligence (AI) can be more situationally appropriate and “learn” in a way that leverages adjacency to understand and refine outputs, using peripheral information and connections.
Graph technologies are a state-of-the-art, purpose-built method for adding and leveraging context from data and are increasingly integrated with machine learning and artificial intelligence solutions in order to add contextual information. For any machine learning or AI application, data quality – and not just quantity – is critical. Graphs also serve as a source of truth for AI-related data and components for greater reliability. Amy will discuss how graphs can add essential context to guide more responsible AI that is more robust, reliable, and trustworthy.
GQL: Get Ready for a Standard Graph Query Language
Stefan Plantikow - Neo4j
A new standard query language is coming. For the first time in decades, the ISO Standards Committee has initiated work on a new language, a graph query language (GQL). With the backing of experts from seven nations and major database vendors, an early draft of this query language for property graphs is now ready.
Attend this session to learn about the initial features for GQL (ISO/IEC 39075), as well as ongoing efforts by the community and in the official standardization bodies. Get early information on capabilities such as the generalized property graph data model, advanced pattern matching, graph schema, parameterized graph views, query composition, and the advanced type system. As GQL is a sibling to SQL, we’ll also discuss how it aligns with shared features from the upcoming edition of SQL.
This talk will help you get ready for GQL, an unprecedented development in the graph landscape, with tips on planning for future transitions. You’ll also get guidance on how to engage with the GQL community and how to keep up to date with the official standardization process.
Graph-Based Business Intelligence
Clark Richey - FactGem
This talk walks the audience through the adoption of graph databases to solve complex business problems through the required evolution of business intelligence tools to support this new technology. The talk will cover the following elements:
Discussion of benefits of graph database vs. relational database for data science and businesses use cases, e.g. Customer 360, law enforcement
We will show how the graph database is essentially an enhanced relational database where relationships are expressed directly as edges, rather than indirectly as joins.
We will examine how this structure gives rise to major benefits when dealing with very large datasets, and in looking for patterns buried in multi-party relationships.
History of BI tools
Developed to solve the problem of visualizing data in relational databases, ergo their core concepts are table-based and they need to have all the data in a relational structure.
Problems inherent with using traditional table-based BI against a graph structure
We will demonstrate the explosion of data that occurs when moving data from graph to relational.
We will show how an easy graph query will result in a problematic relational join when exported to that relational structure.
We will measure the extreme time needed to perform multiple join query against a large data set in relational DB.
Relational databases require cleanly matching data, organized around a single key. This makes it very difficult to join multiple datasets from different origins. We will explore how this can be solved in the graph.
Relational databases don’t handle sparse data well, requiring that all edge cases have to be included in the table and expressed explicitly, resulting in large, sparsely populated tables with many columns and poor performance.
We present native, graph-based Business Intelligence
Direct connection to graph database eliminates the data bloat and the problematic relational joins
We will demonstrate the speed, pattern matching, and the ability to gather relevant information and visualize instantaneously, that is inherent in this solution.
We will discuss and demonstrate some of the more sophisticated graph-based visualizations and algorithms that are now possible.
Advantages of linked traditional BI chart with graph and graph exploration.
Existing big data concepts in GraphBI: How Map, Reduce, Group and Filter are also a great fit for large graph analysis.
Visualizations that leverage Map, Reduce, Group and Filter for graph.
Managing Relationships in the Healthcare Industry with Graphileon: A CHG Healthcare Use Case
Tyler Glaittli - CHG Healthcare
One of the great powers of graph technology is that it visually communicates relationships to users. At CHG Healthcare, we're using Graphileon to manage complex relationships in America's healthcare industry in an intuitive and non-technical way. Many current information systems in production today weren't built to manage multi-layered, complex relationships. As a part of CHG's digital transformation, we're integrating a new wave of tools to bring order to the chaos. Neo4j and Graphileon are at the forefront of this transformation. In this presentation, we'll describe some of the challenges of managing complex relationships in a monolithic framework with legacy systems. Then we'll show how a solution built with Graphileon's components (functions that are connected by triggers) allows us to manage the most critical components of our industry. Lastly, we'll share some of the lessons we learned.
Query Processor of GraphflowDB and Techniques for the Graph Databases of 2020s
Semih Salihoglu - University of Waterloo
Graph database management systems (GDBMSs) in contemporary jargon refers to systems that adopt the property graph model and often power applications such as fraud detection and recommendations that require very fast joins of records, often beyond the performance that existing relational systems generally provide. There are several techniques that are universally adopted by GDBMSs to speed up joins, such as double indexing of pre-defined relations in adjacency lists and ID-based hash joins. In this talk, I will give an overview of of the query processor of GraphflowDB, a graph database we are actively developing at University of Waterloo, that integrates three other novel techniques to perform very fast joins tailored for large-scale graphs: (1) worst-case optimal join-style intersection based joins; (2) a novel indexing sub-system that allows indexing subsets of edges, similar to relational views, and allowing adjacency lists to be bound to edges; and (3) factorized processing, which allows query processing on compressed intermediate data. These techniques have been introduced by the theory community in the context of relational database management systems but I will argue that one of their best applications are in GDBMSs.
Creating Explainable AI with Rules
Jans Aasman - Franz. Inc
This talk is based on Jans' recent article for Forbes magazine.
"There’s a fascinating dichotomy in artificial intelligence between statistics and rules, machine learning and expert systems. Newcomers to artificial intelligence (AI) regard machine learning as innately superior to brittle rules-based systems, while the history of this field reveals both rules and probabilistic learning are integral components of AI.
This fact is perhaps nowhere truer than in establishing explainable AI, which is central to the long-term business value of AI front-office use cases."
"The fundamental necessity for explainable AI spans regulatory compliance, fairness, transparency, ethics and lack of bias -- although this is not a complete list. For example, the effectiveness of counteracting financial crimes and increasing revenues from advanced machine learning predictions in financial services could be greatly enhanced by deploying more accurate deep learning models. But all of this would be arduous to explain to regulators. Translating those results into explainable rules is the basis for more widespread AI deployments producing a more meaningful impact on society."
Automated Encoding of Knowledge from Unstructured Natural Language Text into a Graph Database
Chris Davis - Lymba
Most contemporary data analysts are familiar with mapping knowledge onto tabular data models like spreadsheets or relational databases. However, these models are sometimes too broad to capture subtle relationships between granular concepts among different records. Graph databases provide this conceptual granularity, but they typically require that knowledge is curated and formatted by subject matter experts, which is extremely time- and labor-intensive. This presentation presents an approach to automate the conversion of natural language text into a structured RDF graph database.
Improving Real-Time Predictive Algorithms with Asynchronous Graph Augmentation
Dave Bechberger / Kelly Mondor - DataStax
Shop online, swipe a credit card, check-in on social media – predictive algorithms are watching all of this in real-time, analyzing the behaviors in order to find fraud, or tailor a news feed, or just suggest some other product to purchase.
Graphs are frequently helpful when working with these sorts of predictive algorithms as these use cases can benefit heavily from examining how data is connected. The difficulty lies in that the relevance of connections change over time and efficiently finding the connections that matter becomes exponentially harder as more and more data is added. Historically due to the length of time and amount of computation required this has been solved by running large batch process runs daily, weekly, or even less frequently to update the relevance of connections within a graph. However, in today's world, this is not always fast enough.
What we will show is a method for decoupling these complex analytical transactions from real-time transactions to improve these predictions in near real-time without performance degradation. We will discuss how this method can leverage algorithms such as graph analytics or machine learning to provide optimized graph connections leading to more accurate predictions. Wrapping up we will demonstrate how to apply this process to common use cases such as fraud or personalization to provide better real-time predictive results.
JGTSDB: A JanusGraph/TimescaleDB Mashup
Ted Wilmes - Expero
Time series data is ubiquitous, appearing in many use cases including finance, supply chain, and energy production. Consequently, “How should I model time series data in my graph database?” is one of the top questions folks have when first kicking the tires on a graph database like JanusGraph. Time series provides a number of challenges for a graph database both as it’s coming into the system and when it’s being read it out. High volume and velocity means you need to ingest tens to hundreds of thousands of points per second (or more!). Users expect to be able to perform low latency aggregations and more complicated analytics functions over this time series data. JanusGraph can meet the ingest requirements but it requires some very specific data modeling and tuning tricks that frequently are not worth the extra development complexity. Because of this, we’d usually recommend storing this data in an entirely different database that is more suited to time series workloads. For this talk, we will discuss an alternative approach where we integrate TimescaleDB access into JanusGraph itself, allowing users to write a single Gremlin query that transparently traverses their graph and time series data. This setup inherits the operational characteristics of Timescaledb while providing a single, unified and low latency query interface that does not require careful and specific graph data modeling and tuning.
RedisGraph and GraphBLAS: Powering graph databases and analytics with linear algebra.
Roi Lipman - Redis Labs
The talk will present the main concepts and techniques applied by RedisGraph, the first property graph database to utilise sparse adjacency matrices and linear algebra to perform graph queries.
In the talk, I'll cover the execution of a Cypher query from start to finish. The focus will be how different optimisations are applied and how query parts are converted into algebraic operations in order to produce a result-set as quickly as possible. This will demonstrate the benefits of taking the algebraic approach.
This talk will contain sufficient explanations that no technical prerequisites will be strictly necessary. Audience members who have some prior experience with graph databases, Cypher, and/or linear algebra may be more easily able to focus on the materials being presented that are entirely new to them.
With the increasing interest from both the academia and industry on the subject of sparse adjacency matrices, how can they be used and do they live up to the expectation (memory reduction, fast and parallel processing).
I believe my talk can shad some light on the matter, and hopefully inspire others to try working him them.
Crime Analysis with Visual Graph Transformation .
Sony Green - Kineviz
Criminals are innovative. Because their strategies are constantly evolving, the data model that revealed pattern A could miss pattern B completely. We’ll present an intuitive analytics methodology based on visual graph transformations that gives crime fighters the flexibility to form inferences and iterate rapidly.
The multitude of attack vectors available to criminals enables their strategies to evolve rapidly. This imposes a limited shelf life on fraud fighting solutions. It’s imperative to integrate new (and frequently messy) data sources as well as combining existing data sources in new ways. Combining sources like biometrics, social media, financial, geospatial, and time series data can reveal fraud rings, identity theft, and more. Unfortunately, quickly iterating on these data sources and managing them long term is a nightmare with traditional static schemas.
We have defined a set of Graph Operators---Map, Extract, Aggregate, Link and Shortcut---that enable rapid, non-destructive transformation of a property graph. Individually or in concert, these operators enable statistical analysis, anomaly detection, and simplification or delineation of graph patterns. The resulting workflow is intuitive and highly visual, enabling non-technical users to perform complex analyses. For fraud fighters, effective application of graph operators is a game changer.