Graph Day! at Data Day Texas
Following the success of the inaugural Graph Day in January 2016, scores of people asked if we would consider bringing Graph Day back to Austin for 2017. We succumbed. There will be a Graph Day 2017, and it is included as part of Data Day Texas. Your Data Day ticket includes all the Graph Day sessions and workshops as well as all the Data Day and NLP sessions.
Below is a list of the currently confirmed Graph Day sessions. We are awaiting final abstracts from several of the confirmed speakers. For a complete list of Graph Day speakers, visit the Graph Day Speakers page
NEW - Graph Databases: what's next?
Luca Garulli (OrientDB)
Luca Garulli, the Founder of OrientDB, the 2nd Graph Database on the market, will analyze the main differences between today's leading Graph Database products, discussing each product's strengths and the direction the Graph Database market is headed. If you're working with a Graph Database or you're interested in learning more about the Power of Graphs today and in the upcoming future, you can't miss this presentation.
NEW - Neo4j Graph Database Workshop For The Data Scientist Using Python. (90 minutes)
William Lyon (Neo4j)
Graph databases provide a flexible and intuitive data model that is ideal for many data science use cases such as ad-hoc data analysis, generating personalized recommendations, social network analysis, natural language processing, and fraud detection. In addition Cypher, the query language for graphs, allows for traversing the graph by defining expressive graph queries using graph pattern matching. In this workshop we will work through a series of hands on use cases using Neo4j and common Python science tools such as pandas, igraph, and matplotlib. We will cover how to connect to Neo4j from Python, an overview of how to query graphs using Cypher, how to import data into Neo4j, data visualization, and how to use Python data science tools in conjunction with Neo4j for network analysis, generating recommendations, and fraud detection. Attendees should install Neo4j, Jupyter and be somewhat familiar with Python to get the most out of the session.
NEW - Enabling a Multimodel Graph Platform with Apache TinkerPop.
Jason Plurad (IBM / Apache Software Foundation)
Graphs are everywhere, but in a modern data stack, they are not the only tool in the toolbox. With Apache TinkerPop, adding graph capability on top of your existing data platform is not as daunting as it sounds. We will do a deep dive on writing Traversal Strategies to optimize performance of the underlying graph database. We will investigate how various TinkerPop systems offer unique possibilities in a multimodel approach to graph processing. We will discuss how using Gremlin frees you from vendor lock-in and enables you to swap out your graph database as your requirements evolve.
NEW - Graphs vs Tables: Ready? Fight.
Denise Gosnell (PokitDok)
Lessons learned from building similarity models from structured healthcare data in both graph and relational dbs
The infrastructure debate for the “optimal” data science environment is a loud and ever changing conversation. At PokitDok, the data engineering and data science teams have tested and deployed a myriad of architecture combinations including dbs like Titan, Datastax Enterprise, Neo4j, ElasticSearch, MySql, Cassandra, Mongo, … the list goes on. For us, the final implementations of tested and deployed data science pipelines became a balance of the scientific modeling domain, the right engineering tool, and a bunch of sandboxes.
In this talk, a Denise Gosnell from PokitDok will discuss the polarizing false dichotomy of graph dbs vs. relational dbs. She will step through two different recommendation pipelines which ingest and transform structured healthcare transactions into similarity models. She will use (a) graph traversals to rank entities in a database, (b) relational tables to create co-occurrence similarity clusters, and then (c) discuss the modeling intricacies of each development process. Attendees of this talk will be introduced to the complexities of healthcare data analysis, step through graph and tabular based similarity models, and dive into the ongoing false dichotomy of graph vs relational dbs.
NEW - Building a Graph Database in the Cloud: challenges and advantages.
Alaa Mahmoud (IBM)
There are various challenges that face new and existing Graph Database users that make it hard to get started and also contain the cost of maintaining the infrastructure. A Cloud offering that’s cost-effective, robust and scalable seems to be the right answer to these challenges. However, it comes with its own challenges as well. In this talk, we’ll go over the lessons learned from building IBM Graph, a Graph database as a service offering from IBM. Here are the topics we'll be presenting in this talk:
- Hurdles that slow down the adoption of Graph databases
- The need for a cloud-base Graph Database solution
- Different strategies to provide a cloud solution
- Challenges that face Graph Database providers in putting a Graph database on the cloud.
NEW - Graphs in time and space: A visual example
Corey Lanum, Cambridge Intelligence
Graphs and graph databases are helping to solve some of today’s most pressing challenges. From managing critical infrastructure and understanding cyber threats to detecting fraud, we have worked with hundreds of developers building all kinds of mission-critical graph applications.
In almost all of these projects, graphs are being used not just to understand the ‘who’ / ‘how’ / ‘what’ questions, but also the ‘where’ and ‘when’.
This presentation will explore with two dimensions of graphs that, from our experience, cause the most confusion but potentially contain vital data insight: space and time.
Corey will use visual examples to explain the quirks (and importance) of dynamic and geospatial graphs. He will then show how graph visualization tools empower users to explore connections between people, events, locations and times.
Do I need a Graph Database
Juan Sequeda, Capsenta
This talk grew out Juan Sequeda's office hours following the Seattle Graph Meetup. Some of the questions posed were: How do I recognize problem best solved with a graph solution? How do I determine the best type of graph to solve the problem? How do I manage the data where both graph and relational operations will be performed? Juan did such a great job of explaining the options, we asked him to develop his responses into a formal talk.
Time for a new relation: Going from RDBMS to Graph
Patrick McFadin, DataStax
Most of our introductory graph sessions come from practitioners with a heavy graph background. Patrick McFadin will present a session from the perspective of someone with a broad relational background (at scale) who has recently started working with graphs.
Like many of you, I have a good deal of experience building data models and applications using a relational database. Along the way you may have learned to data model for non-relational databases, but wait! Now we are seeing Graph databases increase in popularity and here’s yet another thing to figure out. I’m here to help! Let’s take all that hard won database knowledge and apply it to building proper Graph based applications. You should take away the following:
- How graph creates relations differently than an RDBMS
- How to insert and query data
- When to use a graph database
- When NOT to use a graph database
- Things that are unique to a graph database
Moving Your Data To Graph
Dave Bechberger, Expero
Graphs are a great analysis and transactional model for certain kinds of data, but unless you're starting your company from scratch, chances are you've got relational or document data you'd like to start with. Using cases from recent work, we will discuss the fundamentals of good graph data modeling and how relational models and document models are best expressed in property graph form, including some common anti-patterns.
NEW - Traversing our way through Spark GraphFrames and GraphX
Mo Patel, Think Big
The power of networks effects have been well studied and put into production by some of most successful organizations around the world. Networks form graph data structures and being able to harness analytic value from these structures furthers increases the utility of networks. In this talk, Mo Patel will review the newly introduced Spark GraphFrames feature and walk through an end to end Graph Analytics use case using Spark GraphX library.
NEW - Implementing Network Algorithms in TinkerPop's GraphComputer
Ted Wilmes, Apache Software Foundation / Expero
The Apache TinkerPop project comes with a set of centrality and clustering graph algorithm implementations, but even more importantly, provides the building blocks to implement your own. There are a plethora of other algorithms that support a wide variety of uses cases including fraud detection, flow analysis, and resource scheduling to name just a few. This talk will dig into how the TinkerPop GraphComputer can execute vertex programs in parallel across massive graphs and how you can implement algorithms that fit your specific use cases.
NEW - Graphs + Sensors = The Internet of Connected Things
Ryan Boyd (Neo4j)
There is no question that the proliferation of connected devices has increased the volume, velocity, and variety of data available. Deriving value and business insight from this data is an ever evolving challenge for the enterprise. Moving beyond analyzing just discrete data points is when the real value of streaming sensor data begins to emerge. Graph databases allow for working with data in the context of the overall network, not just a stream of values from a sensor. This talk with cover an architecture for working with streaming data and graph databases, use-cases that make sense for graphs and IoT data, and how graphs can enable better real-time decisions from sensor data.
NEW - Graph Query Languages
Juan Sequeda, Capsenta
The Linked Data Benchmark Council (LDBC) is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. The Graph Query Language task force of LDBC is studying query languages for graph data management systems, and specifically those systems storing so-called Property Graph data. The goals of the GraphQL task force are to: - devise a list of desired features and functionalities of a graph query language - evaluate a number of existing languages (i.e. Cypher, Gremlin, PGQL, SPARQL, SQL) and identify possible issues - provide a better understanding of the design space and state-of-the-art - develop proposals for changes to existing query languages, or even a new graph query language. This query language should cover the needs of the most important use-cases for such systems, such as social network and Business Intelligence workloads.
This talk will present an update of the work accomplished by the LDBC GraphQL task force. We also look for input from the graph community.
NEW - Time Series and Audit Trails: Modeling Time in an Industrial Equipment Property Graph
Sebastian Good, Expero
Natural networks make great cases for graph databases -- telecommuncations, interconnected parts in engines, transportation routes. Companies collect changes in and measurements across this network: new connections made, maintenance and sensor readings, truck locations. In this talk, we will discuss several methods for storing sensor data in graph databases, and for storing a history of changes to a network. Concepts covered will include time series, temporal and bitemporal models.
NEW - Meaningful User Experience with Graph Data
Chris LaCava, Expero
Congratulations, your data is up and running in a graph database! This is the first step of many to unlocking the potential in your data. It’s easy to get mired in the complexities of graph technology and forget that real users, mere mortals, will need to use this information to inform mission critical tasks. To get the value out of your graph investment, you’ll need to provide an experience that enables users to explore and visualize your graph data in meaningful ways. In this talk we’ll take a hands on approach to applying user-centered strategies and leveraging the latest UI tools to rapidly create great experiences with graph data. Topics will include Tailoring experiences to the intended audience and data and Determining the right visualization for the job
NEW - LEBM: Making a Thoroughly Nasty Graph Database Benchmark
David Mizell, Cray
LUBM (Lehigh University BenchMark) is the most-used benchmark for measuring the query performance of graph databases that use the World-Wide Web Consortium’s “RDF Triples” data representation standard and the SPARQL query language (also a W3C standard). The LUBM benchmark contains 14 SPARQL queries that are run against a synthetic database that contains information about however many fictional universities the user specifies. The LUBM synthesizer (written in Java) creates data about the universities’ faculties, their students, their grad students, what department they’re in, papers that the faculty and grad students have published, and so on. The problem with LUBM is that its data is too localized to make it representative of real graph databases. That makes it unrealistically easy to achieve high performance. It was really designed to test the power of a graph database’s ontology processing logic, rather than its performance on complex, graph-oriented queries. We started with the LUBM synthetic university data and superimposed a “social network” – x is friends with y, y is friends with z… between its students. This kind of graph is typically very irregular, non-local, unbalanced and thus hard to efficiently query. Social networks come up a lot in real-world graph databases, so this extension of LUBM (we call it LEBM, the Lehigh Extended BenchMark) is much more representative of the kinds of graphs people want to run queries against. We wrote the LEBM synthesizer in Java, and plan to make it publicly available, probably via Lehigh’s LUBM web site.
NEW - Make Graphs Great Again - Analyzing Election Data Using Neo4j
William Lyon (Neo4j)
The US 2016 election was data-rich - from hundreds of millions of tweets about the election, to polling data, to election results, to campaign funding reports. Throughout the election cycle, our team worked along with the Neo4j community to understand the relationship between these data. This talk will discuss how graphs enable us to use these relationships to understand the candidates, races, and overall election. Learn about the Cypher graph query language, graph algorithms (using user defined procedures in Java) and the neo4j-spatial extension and how graph analysis helped us make sense of the abundance of election related data.
NEW - How to Manage and Harness Large-Scale Graph Data with Grakn.
Haikal Pribadi (GRAKN.AI)
In this tutorial, we will describe the characteristics of large-scale interconnected data and why they are challenging to work with. We will then dive into different techniques to overcome these challenges using an open source knowledge graph, Grakn.
In order to maintain information consistency over large network data containing heterogeneous data types, the ability to expressively model your complex dataset is critical. We will demonstrate how to model your dataset through an ontology, which will also function as the schema to guarantee data consistency. You will then learn how to easily modify your schema to mimic any changes in your domain.
Big datasets often come from multiple sources, consist of different types and are in various formats, from JSON to CSV amongst others. Because of this heterogeneity, migration and consolidation into a single consistent store is fraught with problems. This section will cover some typical methodologies and tools used to migrate data into a single source as well as common issues encountered. We will also introduce a language to help us migrate this heterogeneous, multi-sourced data into a consolidated information network.
Performing complex queries efficiently is an integral part of processing interconnected big data. However, queries that involve multiple tables, different data formats or perform aggregation functions over this type of data are frequently verbose, slow to execute or both when using conventional datastores. You will learn about how to compress queries and reduce their complexity via generic rules that can be defined as reusable patterns. We will also demonstrate how to leverage domain specific rules to infer knowledge that is not explicitly stored.
The tutorial will conclude by introducing how to perform complex traversals and intelligent discoveries using a graph query language, Graql. We will show you how to explore connections in your information network, draw implicit insights from explicitly stored data, and perform real time analytics.
NEW - Large Scale Graph Analytics Through Graql.
Borislav Iordanov (GRAKN.AI)
We will discuss the development of graph analytics through distributed algorithms, the different types of analysis that are possible, and some of the potential benefits and business applications, such as fraud detection, recommendation engines, customer 360 and biomedical research. We go on to share our insight on the complexity, lack of reusability and specialised engineering talent required to implement graph analytics successfully, and will demonstrate how the development of graph analytics is costly for every unique dataset. We then introduce a method that combines an open-source knowledge graph, Grakn, with Apache Spark to run Bulk Synchronous Parallel (BSP) algorithms, such as MapReduce and Pregel, to perform massively parallel graph analytics through a graph query language. By abstracting the low-level implementation details of graph analytics, we showed the audience a way to avoid the pitfalls of developing graph analytics from scratch as described above.
The audience will learn how they can harness the power of graph analytics through few lines of a knowledge-oriented query language, Graql, to perform:
1. Cluster analysis to identify common structures within data
2. Path analysis to determine the shortest distance between pieces of information
3. Centrality analysis to identify most interconnected instances in the network
4. Large scale statistics to summarise and understand quantitative data over information networks
We will demonstrate how the audience can perform simple queries for each type of analysis, how they can easily integrate it into the development of intelligent systems, and how Graql enables the development of powerful business applications.