Graph Day Returns to Data Day Texas

Yes! Graph Day returns to Data Day Texas! Every time we host Graph Day, it gets bigger. This one will be a full house. If you want to take advantage of discount tickets, don't daudle. Purchase yours now. Expect surprises. Check this page for updates.

Confirmed Graph Day Sessions

Keynote: The State of JanusGraph 2018

Ted Wilmes - Expero

Graph database adoption increased at a rapid clip in 2017 and shows no sign of slowing down as we begin 2018. When coupled with the right problem set, it's a compelling solution and word has spread from the startup world all the way to the Fortune 500. JanusGraph, an Apache TinkerPop compatible fork of the popular Titan graph database, was one of the newcomers into the marketplace last year. Its future was uncertain, but a dedicated community coalesced around it and three releases later and with an ever growing list of contributors, it is here to stay. This talk will introduce JanusGraph, discuss where it fits into the existing graph database ecosystem, and then review the progress made over the past year along with an eye to what exciting things are coming up in 2018.

Improving Graph Based Entity Resolution using Data Mining and NLP

David Bechberger - Gene by Gene

“Hey, here are those new data files to add. I ‘cleaned’ them myself so it should be easy. Right?”
Words like these strike fear into the heart of all developers but integrating ‘dirty’ unstructured, denormalized and text heavy datasets from multiple locations is becoming the de facto standard when building out data platforms.
In this talk we will look at how we can augment our graph’s attributes using techniques from data mining (e.g. string similarity/distance measures) and Natural Language Processing (e.g. keyword extraction, named entity recognition). We will then walkthrough an example using this methodology to demonstrate the improvements in the accuracy of the resulting matches.

AI and Graph to optimize steam process in a large process plant

Arnaud de Moissac - DCbrain / Jean-Reynald Macé - Areva

Steam production and distribution networks can cost a lot of money for a large process plant. To optimise such complex networks, you have to deal with several parameters such as : Physical issues like boilers efficiency or clogging of heat exchangers, Availability constraints of your network regarding the business SLA and Non-linear dependancies between the pool of boilers and the steam network. It can be a bit tricky to build a objective function by using standard physical models.
This talk will show how we can use first a graph structure to modelize the network and extract features from it. Then deep learning to build transfer function to represent the behavior of each steam producers and consumers. And finalely run optimization based on this meta-model to find the best way to operate the plant.

3 ways to build a near real-time recommendation engine

David Gilardi - DataStax

David Gilardi will show how to add a near real-time recommendation engine to KillrVideo, a reference application built to help developers learn Cassandra, Graph, and DataStax Enterprise. David will discuss the benefits and practical considerations of building a solution that leverages multiple, highly connected data models such as graph and tabular. He’ll also take a look at multiple recommendation engine examples, including use of domain specific languages to make development even simpler.
An introduction to KillrVideo - David will briefly introduce the reference implementation, a cloud-based video sharing application which uses the Apache Cassandra core of DataStax Enterprise as well as DSE Search and DSE Graph integrations.
What do we mean by “multiple, highly connected models”? - David will talk about what this means and discuss the benefits of these attributes in building applications that include transaction processing, search, and graph.
Adding a recommendation engine - David will discuss the task to extend KillrVideo to provide real-time video recommendations using DSE Graph and the popular Gremlin graph traversal language using DSL’s (Domain Specific Language).

Everything is not a graph problem (but there are plenty)

Dr. Denise Gosnell - DataStax

As the reality of the graph hype cycle sets in, the graph pragmatists have shown up to guide the charge. What we are seeing and experiencing is an adjustment in mindset: the convergence to multi-model database systems parallels the mentality of using the right tool for the problem. With graph databases, there is an intricate balance to find where the rubber meets the road between theorists and practitioners.
Before hammering away on the keyboard to insert vertices and edges, it is crucial to iterate and drive the development life cycle from definitive use cases. Too many times the field has seen monoglot system thinking pressure the construction of the one graph that can rule it all which can result in some impressive scope creep. In this talk, Dr. Gosnell will walk through common solution design considerations that can make or break a graph implementation and suggest some best practices for navigating common misconceptions.

How to Destroy Your Graph Project with Terrible Visualization

Corey Lanum - Cambridge Intelligence

We are all using graphs for a reason - in many cases, it's because the graph model presents an intuitive view of the data. Unfortunately, the most elegant graph data models can often be stymied by bad visualizations that obscure rather than enlighten. In this talk, Corey Lanum will discuss a number of bad practices in graph visualization that are surprisingly common. He will then outline graph visualization best practices to help create visual interfaces to graph data that convey useful insight into the data.

Real-time deep link analytics: The next stage of graph analytics

Dr. Victor Lee - TigerGraph

Graph databases are the fastest growing category in data management, according to DB-Engines. However, most graph queries only traverse two hops in big graphs due to limitations in most graph databases. Real-world applications require deep link analytics that traverse far more than three hops. To support real-time deep link analytics, we need the power of combining real-time data updates, big datasets, and deep link traversals.
Dr. Victor Lee offers an overview of TigerGraph’s distributed Native Parallel Graph, a fraud detection system that manages 100 billion graph elements to detect risk and fraudulent groups. Yu discusses the techniques behind the distributed native parallel graph platform, including how it partitions graph data across machines, supports fast update, and is still able to perform fast graph traversal and computation. He also shares a subsecond real-time fraud detection system managing 100 billion graph elements to detect risk and fraudulent groups.
(Product Showcase)

Graph Analysis of Russian Twitter Trolls

William Lyon - Neo4j

As part of the US House Intelligence Committee investigation into how Russia may have influenced the 2016 US election, Twitter released the screen names of nearly 3000 Twitter accounts tied to Russia's Internet Research Agency. These accounts were immediately suspended, deleting the data from and Twitter's developer API. In this talk we show how we can reconstruct a subset of the Twitter network of these Russian troll accounts and apply graph analytics to the data to try to uncover how these accounts were spreading fake news.
This case study style presentation will show how we collected and munged the data, taking advantage of the flexibility of the property graph. We'll dive into how NLP and graph algorithms like PageRank and community detection can be applied in the context of social media to make sense of the data. We'll show how Cypher, the query language for graphs is used to work with graph data. And we'll show how visualization is used in combination with these algorithms to interpret results of the analysis and to help share the story of the data.

Understanding People Using Three Different Kinds of Graphs

Misty Nodine - Spiceworks

There are various ways that we can learn about people using graph-based approaches.
Social graphs – These graphs help understand people via the connections they have with other people. They are characterized by having one kind of node type (person) and one type of edge type (whatever social relationship the graph is representing). Typical questions we ask in this space are: How important is this person in this relationship? How well-connected are the people? What are the interesting groups?
Knowledge graphs – These graphs represent information we have about a user, what things we can know about them. For instance, it may have nodes not only for people but for places, or companies. There are also a variety of edge types, like ‘lives_in’ between a person and a city. Knowledge graphs typically take two forms: RDF or entity-relationship. The RDF representations also are related to ontologies and the semantic web. Knowledge graphs enable you to leverage existential knowledge or knowledge related to other people to understand a person. Hence, these are graphs that we reason over. Example questions that a knowledge graph might answer include: How big a company does this person work for?
Probabilistic graphical models – Probabilistic graphical models allow us to infer information about a person based on things we have observed directly about the person based on probabilistic relationships. In a PGM, the nodes represent specific things you can observe (variables), and each edge has the conditional dependencies between the two variables. In real life, we observe actual values for some subset of the nodes and can then know the probabilities for the values of the unobserved variables.
This talk will provide an overview of these three different kinds of graphs and their desirable properties, and the algorithms and approaches that you use over those graphs to understand more about a person.

Graph Convolutional Networks for Node Classification

Steve Purves - Expero

We describe a method of classifying nodes in an information network by application of a non-Euclidean convolutional neural network. The convolutional layers are kernelized to operate directly on the natural manifold of the information space, and thus produce output more accurate than analysis on information arbitrarily embedded in a Euclidean geometry. First, we describe the benefits of operating in a non-Euclidean geometry. We then sketch out how graph convolutional networks work. Finally, we demonstrate the application of this technique by predicting the credit-worthiness of applicants based on their population characteristics and their relationships to other individuals.

Writing Distributed Graph Algorithms

Andrew Ray - Sam's Club

Distributed graph algorithms are an important concept for understanding large scale connected data. One such algorithm, Google’s PageRank, changed internet search forever. Efficient implementations of these algorithms in distributed systems are essential to operate at scale.
This talk will introduce the main abstractions for these types of algorithms. First we will discuss the Pregel abstraction created by Google to solve the PageRank problem at scale. Then we will discuss the PowerGraph abstraction and how it overcomes some of the weaknesses of Pregel. Finally we will turn to GraphX and how it combines together some of the best parts of Pregel and PowerGraph to make an easier to use abstraction.
For all of these abstractions we will discuss the implementations of three key examples: Connected Components, Single Source Shortest Path, and PageRank. For the first two abstractions this will be in pseudo code and for GraphX we will use Scala. At the end we will discuss some practical GraphX tips and tricks.

G-CORE: A Core for Future Graph Query Languages, designed by the LDBC Graph Query Language Task Force

Juan Sequeda - Capsenta

In this talk, Juan will report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity.
This work is the culmination of 2.5 years of intensive discussion between the LDBC Graph Query Language Task Force and members of industry (Capsenta, HP, Huawei, IBM, Neo4j, Oracle, SAP and Sparsity) and academia (CWI Amsterdam, PUC Chile, Technische Universiteit Eindhoven, Technische Universitat Dresden, Universidad de Chile, Universidad de Talca).
Link to paper:

Fishing Graphs in a Hadoop Data Lake

Claudius Weinberger - ArangoDB

Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
Spark with GraphX is great for answering relatively simple graph questions which are worth starting a Spark job for because they essentially involve the whole graph. But does it make sense to start one for every ad-hoc query or is it suitable for complex real-time queries?

Knowledge Graph Sessions

Navigating Time and Probability in Knowledge Graphs

Jans Aasman - Franz, Inc.

The market for knowledge graphs is rapidly developing and evolving to solve widely acknowledged deficiencies with data warehouse approaches. Graph databases are providing the foundation for these knowledge graphs and in our enterprise customer base we see two approaches forming: static knowledge graphs and dynamic event driven knowledge graphs. Static knowledge graphs focus mostly on metadata about entities and the relationships between these entities but they don’t capture ongoing business processes. DBPedia, Geonames and Census or Pubmed are great examples of static knowledge.
Dynamic knowledge graphs are used in the enterprise to facilitate internal processes, facilitate the improvement of products or services or gather dynamic knowledge about customers. I recently authored an IEEE article describing this evolution of knowledge graphs in the Enterprise and during this presentation I will describe two critical success factors for dynamic knowledge graphs, a uniform way to model, query and interactively navigate time and the power of incorporating probabilities into the graph. The presentation will cover three use cases and live demos showing the confluence of knowledge via machine learning, visual querying, distributed graph databases, and big data not only displays links between objects, but also quantifies the probability of their occurrence.

Building a Knowledge Graph

Dan Bennett - Thomson Reuters

Just a few years ago a knowledge graph was the domain of academic papers, today they underpin the natural language capabilities of Alexa, Siri, Cortana and Google Now. Graphs are a natural fit for this use case: treating every data item as equivalent and embracing rapid schema mutation. For the past few years, Thomson Reuters has been building a professional information knowledge graph to power our next generation of products. Our graph is RDF based, fast growing and supports a number of different products and user experiences. In this session, Dan will cover our experiences, architecture, tools and lessons learned from building, integrating and maintaining a 100bn triple graph.

Knowledge Graphs: You're doing them wrong!

Michael Grove - Stardog Union

As organizations strive to become more data driven and look for ways to better manage and utilize their data, many look to Knowledge Graphs as the answer. While a Knowledge Graph is the only way to effectively analyze, utilize, and monetize enterprise data at scale, just throwing some data into a plain graph and declaring that a “Knowledge Graph” doesn't cut it, yet many organizations make this mistake. This approach simply creates another data silo and doesn’t address the fundamental data challenge these organizations face and fails to create the kind of data infrastructure they need to accomplish their goals.
In this talk we will provide a short overview of the data silo problem as well as a more robust definition of what exactly an Enterprise Knowledge Graph _is_ and the kinds of features it needs to have in order to provide the capabilities required to help an enterprise achieve its goals. We will also provide a demo that brings together a variety of public data sources into a Knowledge Graph, demonstrating how going beyond having a simple graph structure yields a much more powerful platform for todays' enterprises.

DBpedia - A Global Open Knowledge Network

Sebastian Hellmann - DBpedia Association

In the last 10 years DBpedia has developed in one of the most successful knowledge graphs projects with a thriving community. After the foundation of the DBpedia Association in 2014, there has been a three year long discussion about the new strategy and identity of DBpedia to further push Open Data as well as economic exploitation of Open Data. Especially, the economic exploitation poses a significant challenge as there are very few successful Open Data business models compared to open-source models. Our belief is that open data models - other than content or software - work best in a networked economy, i.e. a collaborative environment. In the presentation, which was co-created by Soeren Auer with feedback from the whole DBpedia community, I will introduce the new DBpedia incubator model (inspired by the Apache Software Foundation and Github), that can help organisations to analyse their current state of data governance and show concrete scenarios where open knowledge graphs can provide economic benefits. All these are backed up and supported by the new DBpedia platform, which is currently being developed.

Building advanced search and analytics engines over arbitrary domains...without a data scientist

Mayank Kejriwal - USC Information Sciences Institute (ISI)

Although search engines like Google work well for many everyday search needs, there are many use cases to which they don't apply. For example, Google does not allow you to limit search to a given arbitrary 'domain' of your choice, be it publications, bioinformatics, or stocks, and it does not offer customized analytics over the domain that you would get if you were able to query the Web like a database. In the past, building such domain-specific 'search and analytics' engines required a full team of engineers and data scientists that would have to collect and crawl the data, set up the infrastructure, write and configure code, and implement complex machine learning algorithms e.g., for extracting useful information from webpages using natural language processing.
The open source Domain-specific Insight Graph (DIG) architecture meets the challenge of domain-specific search by semi-automatically structuring an arbitrary-domain Web corpus into an inter-connected 'knowledge graph' of entities, attributes and relationships. DIG provides the user with intuitive interfaces to define their own schema, customize search, and ultimately, build an entire engine in just a few hours of (non-programming) effort. The search engine itself, once set up, can be used by anyone who has access credentials, and, in addition to structured, faceted and keyword-based search, allows for complex analytics that includes geospatial and temporal analysis, network analysis and dossier generation. The approach is now widely used by law enforcement in the US for important social problems like combating human trafficking, and new uses for it have continued to emerge in DARPA, IARPA and NSF projects. In this talk, I will describe the problem of domain-specific search and the knowledge graph-centric architecture of DIG. I will also cover some important use cases, especially in social domains, for which DIG has already been instantiated and deployed.

Cognitive Graph Analytics on Company Data and News: Popularity ranking, importance and similarity

Atanas Kiryakov - Ontotext

Analyzing diverse data from multiple sources requires concept and entity awareness – the kind of knowledge that people have when saying “I am aware of X”. Matching entities across data sources or recognizing mentions in text requires disambiguation – something that people do with easy and computers often fail to do right. Because an average graduate has awareness about wide set of entities and concepts and computers do not. The most common type of entities, when dealing with business information are people, organizations and locations (POL). Ontotext’s POL Knowledge Graph will be presented. It provides entity awareness about all locations, the globally most popular companies and people. I will demonstrate graph analytics on a knowledge graph of about 2 billion triples loaded in Ontotext GraphDB engine. The graph combines several open data sources mapped to the FIBO ontology and interlinks their entities to 1 million news articles. The demonstration will include: importance ranking of nodes, based on graph centrality; popularity ranking, based on news mentions of the company and its subsidiaries; retrieval of similar nodes in a knowledge graph and determining distinguishing features of entity.

Biorevolutions: Machine Learning and Graph Analysis Illuminate Biotechnology Startup Success

Gunnar Kleemann - Berkeley Data Science Group / Kiersten Henderson - Austin Capital Data Group

Biotechnology is a multi-billion dollar industry. But, if only 11% succeed, which companies are good investments? To gain insight into the likelihood of biotechnology startup success, we leveraged the biotech domain knowledge graph developed at Berkeley Data Science Group (BDSG). The BDSG analysis used a machine learning-based predictive model that used publicly available data on biotech startups. Based on this model, some of the major predictors of US Biotech startup success rate are the percent of employees that are scientists and a company’s geographic location.
To further explore the relationship between startup success and these two features, we turned to using a GRAKN.AI knowledge graph of scientific publications. This knowledge graph includes information on the subject matter of scientific publications as well as the scientists who collaborated to publish together. Using this publication graph we explored the collaboration style of scientists at startups in different cities and found a range of collaboration networks from close-knit to broad ones. We also investigated how scientists at startups in different parts of the country differ in terms of the breadth of their subject matter expertise. We will discuss these and other insights gleaned as we applied graph analysis to analyzing patterns in biotech startups. Our analysis suggests further avenues to explore when refining the accuracy of our model to predict biotechnology startup success.

Integrating Semantic Web Technologies in the Real World: A journey between two cities

Juan Sequeda - Capsenta

An early vision in Computer Science has been to create intelligent systems capable of reasoning on large amounts of data. Today, this vision can be delivered by integrating Relational Databases with Semantic Web technologies via the W3C standards: a graph data model (RDF), ontology language (OWL), mapping language (R2RML) and query language (SPARQL). The research community has successfully been showing how intelligent systems can be created with Semantic Web technologies, dubbed now as Knowledge Graphs. However, where is the mainstream industry adoption? What are the barriers to adoption? What are the open scientific problems that need to be addressed to overcome the barriers?
This talk will chronicle our journey of deploying Semantic Web technologies with real world users to address Business Intelligence and Data Integration needs, describe technical and social obstacles that are present in large organizations, scientific challenges that require attention and argue for the resurrection of Knowledge Engineers.

Silicon Valley vs New York: Who has Better Data Scientists? (a knowledge graph)

Denis Vrdoljak / Gunnar Kleemann - Berkeley Data Science Group

The story of Berkeley Data Science Group started when a New York Data Scientist met a Silicon Valley Data Scientist. Along the way, we built several Data Science tools, including one we showed of at Data Day Seattle recently! We built a tool to analyze and determine the necessary job skills for a given job, along with potential equivalent skills. In this talk, we’ll use our tool and the underlying knowledge graph to contrast the differences in Data Science skill sets between the coasts.
We’ll cover the basics of how our system works and how we used Graph Databases to help us model and analyze traditional NLP problems. We’ll show you the ontology we used to model the information we extracted into a knowledge graph, and how we applied the concept of lazy evaluation to simplify our application. We’ll talk about our experiences with and choices of Graph Databases, and why we chose to use the ones we use in our app.
But, most importantly, in this talk we’ll try to answer a very important question: Which coast has better Data Scientists?

Confirmed Graph Day Speakers

Jans Aasman (SF Bay)

Jans Aasman (Wikipedia / LinkedIn) is a Ph.D. psychologist and expert in Cognitive Science - as well as CEO of Franz Inc., an early innovator in Artificial Intelligence and provider of the graph database, AllegroGraph. As both a scientist and CEO, Dr. Aasman continues to break ground in the areas of Artificial Intelligence and Knowledge Graphs as he works hand-in- hand with numerous Fortune 500 organizations as well as US and Foreign governments. Jans recently authored an IEEE article on “Enterprise Knowledge Graphs”.
Dr. Aasman spent a large part of his professional life in telecommunications research, specializing in applied Artificial Intelligence projects and intelligent user interfaces. He gathered patents in the areas of speech technology, multimodal user interaction, recommendation engines while developing precursor technology for tablets and personal assistants. He was also a professor in the Industrial Design department of the Technical University of Delft. Dr. Aasman is a noted conference speaker at such events as Smart Data, NoSQL Now, International Semantic Web Conference, GeoWeb, AAAI, Enterprise Data World, Text Analytics, and TTI Vanguard to name a few.
Jans will be giving the following Graph Day presentation: Navigating Time and Probability in Knowledge Graphs.

Dave Bechberger (Houston)

Dave Bechberger is a Sr. Architect at Gene by Gene, a genetic genealogy and bioinformatics company, where he works extensively on developing their next-generation data architecture. Dave has spent his career engaging in full stack software development but specializes in building data architectures in complex data domains such as bioinformatics, oil and gas, supply chain management, etc. He uses his knowledge of graph and other big data technologies to build out highly performant and scalable systems. Dave has previously spoken at a variety of international technical conferences including NDC Oslo, NDC London, and Graph DayTexas.
Dave will co-present the following Graph Day session: Improving Graph Based Entity Resolution using Data Mining and NLP.

Dan Bennett (SF Bay) @nonodename

Dan Bennett (Linkedin) is a technology leader within the corporate technology platforms division of Thomson Reuters.
Dan leads a multidisciplinary team of product, program & project managers, engineers, researchers and UX designers responsible for building Big, Open and Linked (Graph) Data capabilities for re-use across the company and sale to customers. This includes Natural Language Processing products like Thomson Reuters Intelligent Tagging, open data at, Data Fusion graph analytics and pan company Knowledge Graph.
A native of the UK, Dan is based in the Twin Cities and, depending on the seasons, when not at work or home is either skiing, running or sailing.
Dan will present the following Knowledge Graph session: Building a Knowledge Graph.

Arnaud De Moissac (Paris)

Arnaud De Moissac (Linkedin) is co-founder of DCbrain, an enterprise scale IoT startup, focused on delivering real time intelligence to multi physical network (energy, cooling, ...) by modeling flows. Arnaud has 10 years of experience in telco and IT networks, and hold several patents. He also worked during 5 years in the energy efficiency area. He holds two masters degree, in electrical engineering and IT architecture
Arnaud will present the following Graph Day session: AI and Graph to optimize steam process in a large process plant.

David Gilardi (Orlando)

David Gilardi is a Technical Evangelist at DataStax, and is a total nerd for distributed databases - with a particular interest in distributed graph. He has over 20 years of relevant experience in programming, database administration, cloud, server/network monitoring, and analytics. Before his time at DataStax he was Senior Development Manager at Hobsons, an education services company, responsible for a flagship SaaS CRM product deployed on hybrid cloud using a combination of relational and NoSQL database technologies.
David will be presenting the following Graph Day session: 3 ways to build a near real-time recommendation engine

Atanas Kiryakov ( Bulgaria )

Atanas Kiryakov (LinkedIn) is the founder and CEO of Ontotext – Semantic Web pioneer and vendor of semantic technology. Atanas is member of the board of the Linked Data Benchmarking Council - standardization body, who's members include all major graph database vendors.
Atanas is a expert in semantic graph databases, reasoning, knowledge graph design, text mining,semantic tagging, linking and search. Author of signature academic publications with more than 2500 citations. Until 2010 Atanas was product manager of GraphDB (formerly OWLIM) – a leading RDF graph database serving mission critical projects in organizations like BBC, Financial Times, Nikkei, S&P, Springer Nature, John Wiley, Elsevier, UK Parliament, Kadaster.NL, The National Gallery of US and top-5 US Banks.
As CEO of Ontotext Atana supervised multiple high profile semantic technology projects across different sectors: media and publishing, market and investment information, cultural heritage and government.
Atanas will be presenting the following Graph Day session: Cognitive Graph Analytics on Company Data and News: Popularity ranking, importance and similarity

Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell

In August 2017, Dr. Denise Gosnell, transitioned into a Solutions Architect position with DataStax where she aspires to build upon her experiences as a data scientist and graph architect to further their established line of graph solutions. Prior to her role with DataStax, Dr. Gosnell was a Data Scientist and Technology Evangelist at PokitDok. During her three years with PokitDok, she built software solutions for and spoke at over a dozen conferences on permissioned blockchains, machine learning applications of graph analytics, and data science within the healthcare industry.
Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research on how our online interactions leave behind unique identifiers that form a “social fingerprint” led to presentations at major conferences from San Diego to London and drew the interest of such tech industry giants as Microsoft Research and Apple. Additionally, she was a leader in addressing the underrepresentation of women in her field and founded a branch of Sheryl Sandberg’s Lean In Circles.
Dr. Gosnell will be giving the following Graph Day presentation: Everything is not a graph problem (but there are plenty).

Michael Grove (Washington, DC) @mikegrovesoft

Michael Grove is VP of Engineering and co-founder of Stardog where he oversees the development of the Stardog Knowledge Graph Platform. Michael studied Computer Science at the University of Maryland and is an alumnus of its well-regarded MIND Lab which specialized in semantic technologies. Before Stardog, he worked at Fujitsu Resarch on the use of graphs and semantic technologies in pervasive computing environments. Michael is an expert in large scale database and reasoning systems and has worked with graphs and graph databases for nearly fifteen years.
Michael will be giving the following Graph Day presentation: Knowledge Graphs: You're doing them wrong!

Sebastian Hellmann (Leipzig)

Sebastian Hellmann is the Executive Director of the DBpedia Association. He has completed his PhD thesis under the guidance of Jens Lehmann and Sören Auer at the University of Leipzig in 2014 on the transformation of NLP tool output to RDF. Sebastian is a senior member of the “Agile Knowledge Engineering and Semantic Web” AKSW research center, which currently has 50 researchers (PhDs and senior researchers) focusing on semantic technology research – often in combination with other areas such as machine learning, databases, and natural language processing. Sebastian is head of the “Knowledge Integration and Language Technologies (KILT)" Competence Center at InfAI. He also is the executive director and board member of the non-profit DBpedia Association. Sebastian is contributor to various open-source projects and communities such as DBpedia, NLP2RDF, DL-Learner and OWLG and wrote code in Java, PHP, JavaScript, Scala, C & C++, MatLab, Prolog, Smodels, but now does everything in Bash and Zsh since he discovered the Ubuntu Terminal. Sebastian is the author of over 80 peer-reviewed scientific publications (h-index of 21 and over 4300 citations [Google Scholar] ( and a not-yet-deleted Wikipedia article about Knowledge Extraction. Currently, he is project manager for Leipzig University and InfAI of the EU H2020 Projects ALIGNED and FREME and the BMWi funded project Smart Data Web. Before that, he was also involved in other funded projects such as FREME (EU H2020), LIDER (EU FP7), BIG and LOD2. Sebastian was chair at the Open Knowledge Conference in 2011, the Workshop on Linked Data in Linguistics 2012, the Linked Data Cup 2012, the Multilingual Linked Data for Enterprises (MLODE) 2012 workshop, the NLP & DBpedia Workshop 2014, the SEMANTiCS 2014, 2015 and 2016 conference as well as the KEKI Workshop 2016. In 2012, we held a hackathon at MLODE 2012 bootstrapping an initial version of the Linguistic Linked Open Data cloud image, which led to the LIDER project and that now publishes regular updates (thanks to John McCrae).
Sebastian will be giving the following knowledge graph presentation: DBpedia - A Global Open Knowledge Network

Kiersten Henderson (Seattle)

Kiersten Henderson (LinkedIn) recently joined Austin Capital Data Corp. as a Data Scientist. She has over 15 years of expertise in Biomedical research. She received her PhD from Cornell Medical School in Genetics and Cell Biology. Kiersten did her Postdoctoral research on the cell biology of aging at the Fred Hutchinson Cancer Research Center in Seattle. She studied how interconnectivity between cellular subsystems exacerbates cellular decline and limits lifespan. Kiersten is currently working towards her Masters in Data Science at UC Berkeley and is excited to use graph analysis to uncover hidden relationships in all kinds of data.
Kiersten will co-present the following Knowledge Graph session: Biorevolutions: Machine Learning & Graph Analysis Illuminate Biotechnology Startup Success.

Mayank Kejriwal (Los Angeles) @kejriwal_mayank

Mayank Kejriwal is a research scientist and lecturer at the University of Southern California's Information Sciences Institute (ISI). He received his Ph.D. from the University of Texas at Austin under Daniel P. Miranker. His dissertation involved Web-scale data linking, and in addition to being published as a book, was recently recognized with an international Best Dissertation award in his field. Some of his projects at ISI, all funded by either DARPA or IARPA, include: automatically extracting information from large Web corpora and building search engines over them (the topic of his talk); 'automating' a data scientist with advanced meta-learning techniques; representing, and reasoning over, terabyte-scale knowledge graphs; combining structured and unstructured data for causal inference; constructing, embedding and analyzing networks over billion-tweet scale social media; and building a platform that makes research easy for geopolitical forecasters. His research sits at the intersection of knowledge graphs, social networks, Web semantics, network science, data integration and AI for social good. He is currently co-authoring a textbook on knowledge graphs (MIT Press, 2018), and has delivered tutorials and demonstrations at numerous conferences and venues, including KDD, AAAI, ISWC and WWW.
Mayank will be giving the following presentation: Building advanced search and analytics engines over arbitrary domains...without a data scientist.

Gunnar Kleemann is a Data Scientist with the Berkeley Data Science Group (BDSG). He is interested in how data science facilitates biological discovery and lowers the barrier to high-throughput research, particularly in small, independent labs. In addition to his work with BDSG, he is also involved in the development and implementation of technologies like the ATX Hackerspace Biology Laboratory.
Gunnar holds a PhD in Molecular Genetics from Albert Einstein College of Medicine and a Master’s in Data Science from UC Berkeley. He did post-doctoral research on the genomics of aging at Princeton University, where his research focused developing high throughput robotic assays to understand how genetic changes alter lifespan and reproductive biology.
Gunnar will co-present the following Knowledge Graph sessions:
Silicon Valley vs New York: Who has Better Data Scientists? (a knowledge graph), and
Biorevolutions: Machine Learning & Graph Analysis Illuminate Biotechnology Startup Success.

Corey Lanum (Boston) @corey_lanum

Corey Lanum (LinkedIn), has a distinguished background in graph visualization. Over the last 15 years he has managed technical and business relationships with dozens of the largest defense and intelligence agencies in North America, in addition to working with many security and anti-fraud organizations in private industry. Prior to joining Cambridge Intelligence as their US Manager, Corey was helping the customers of i2 (now IBM) and SS8 to solve their most complex graph data challenges.
Corey is the author of Visualizing Graph Data from Manning Publications
Corey will co-present the following Graph Day session: How to Destroy Your Graph Project with Terrible Visualization.

Victor Lee (Kent, Ohio)

Dr. Victor Lee is Senior Product Manager at TigerGraph, bringing together a strong academic background, decades of experience in the technology sector, and a strong commitment to quality and serving customer needs. His first stint in Silicon Valley was as an IC circuit designer and technology transfer manager, before returning to school for his computer science PhD, focusing on graph data mining. He received his BS in Electrical Engineering and Computer Science from UC Berkeley, MS in Electrical Engineering from Stanford University, and PhD in Computer Science from Kent State University. Before joining TigerGraph, Victor was a visiting professor at John Carroll University.
Dr. Lee will be giving the following Graph Day presentation: Real-time deep link analytics: The next stage of graph analytics

William Lyon (SFBay) @lyonwj

William Lyon is a software developer at Neo4j, the open source graph database. As an engineer on the Developer Relations team, he works primarily on integrating Neo4j with other technologies, building demo apps, helping other developers build applications with Neo4j, and writing documentation. Prior to joining Neo, William worked as a software developer for several startups in the real estate software, quantitative finance, and predictive API fields. William holds a Masters degree in Computer Science from the University of Montana. You can find him online at
William will be giving the following Graph Day presentation: Graph Analysis of Russian Twitter Trolls

Jean Reynald Macé (Paris)

Jean Reynald Macé (Linkedin) from the ESSEI, top-tier engineer school in the mid - 1980's and kickstarted a high-tech oriented career into one of France's most renowned technology company: Thales. Thales, equivalent to Lockheed-Martin, hired Jean-Reynald to have him work on complex embedded navigation systems as well as encrypted communication protocols. He then left a valuable R&D track record at Xilinx, a Company headquartered in San Jose, famous for programmable devices, notably its Martian rovers. Finally, Jean-Reynald landed at his current position, as one of the Innovation directors of Areva, the world leader in the nuclear fuel cycle.
Jean Reynald will co-present the following Graph Day session: AI and Graph to optimize steam process in a large process plant.

Misty Nodine (Austin)

Misty Nodine (Linkedin / GitHub) has a long history of being interested in trying to understand, organize, and make sense of complexity. She is a respected researcher and developer in the areas of natural language processing, information and knowledge management, agent-based information systems, communications system management, and collaboration management. More recently, she has been focused more on data pipelines and data architectures, specifically for developing a comprehensive understanding of users to improve recommendations and ad targeting.
Misty received her Ph.D. in Computer Science from Brown University in 1993. She received her S.B. and S.M. in EECS from Massachusetts Institute of Technology. She has 30+ years of experience in computer and data science, both in
industrial research and in startup companies. She is currently the Data Architect at Spiceworks.
Misty will be giving the following Graph Day presentation: Understanding People Using Three Different Kinds of Graphs

Jason Plurad (Raleigh-Durham) @pluradj

Jason Plurad is a software developer on IBM's Open Technologies team. He is a committer on Apache TinkerPop, an open source graph computing framework. Jason engages in full stack development (including front end, web tier, NoSQL databases, and big data analytics) and promotes adoption of open source technologies into enterprise applications, service, and solutions. He has spoken previously at IBM conferences (Innovate, Insight) and Triangle Hadoop Users Group meetups.
Jason will be presenting the following Graph Day session: Powers of Ten Redux

Steve Purves (Tenerife, Islas Canarias) @stevejpurves

Steve Purves, Senior Software Developer at Expero, describes himself as an engineer first and foremost. He is comfortable working full-stack, cross-platform in a range of languages and is happiest when there is some mathematical or scientific analysis sprinkled in. He graduated in electrical engineering specializing in signal and image processing, which he took into the scientific computing field in the Oil and Gas industry.
During that time his work was largely split into three: development of low-level number-crunching libraries (C, C++, CUDA) and the cross-platform desktop application with 3D visualization to drive it; applied research in signal processing, numerical analysis algorithm development for 3D seismic analysis, during which he was an IEEE journal geek; and finally management of R&D and Product development teams as CTO, championing practices like TDD, BDD and Agile to get it done.
Around 5 years ago, the excitement of daily binary builds wore thin and Steve got hooked on building applications for the web, starting out with web-desktop integration work for seismic analysis on the iPad. Since then activities have included working on full-stack web applications, with and without desktop integration, for startups in sectors such as Dental, TV Production and Software Micro-Consulting.
Today, he builds reactive web applications with Expero, which feeds his desire to learn and work on industrial-strength projects. Steve waits patiently, with ES6 JavaScript and Jupyter Notebooks at the ready, for the imminent explosion of scientific computing on the web.
Steve will be giving the following Graph Day presentation: Graph Convolutional Networks for Node Classification

Andrew Ray (Bentonville, Arkansas)

Andrew Ray is a Senior Technical Expert at Sam’s Club Technology. He is passionate about big data and has extensive experience working with Apache Spark and Hadoop. Andrew is an active contributor to the Apache Spark project including SparkSQL and GraphX. At Walmart Andrew built an analytics platform on Hadoop that integrated data from multiple retail channels using fuzzy matching and distributed graph algorithms. Andrew also led the adoption of Spark at Walmart from proof-of-concept to production. Andrew earned his Ph.D. in Mathematics from the University of Nebraska, where he worked on extremal graph theory.
Andrew will be giving the following Graph Day presentation: Writing Distributed Graph Algorithms

Juan Sequeda (Austin) @juansequeda

Dr. Juan Sequeda is the co-founder of Capsenta, a spin-off from his research, and the Senior Director of Capsenta Labs. He holds a PhD in Computer Science from the University of Texas at Austin. His research interests are on the intersection of Logic and Data and in particular between the Semantic Web and Relational Databases for data integration, ontology based data access and semantic/graph data management. Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on, Best Student Research Paper at the 2014 International Semantic Web Conference and the 2015 Best Transfer and Innovation Project awarded by Institute for Applied Informatics. Juan is the General Chair of AMW 2018, was the PC chair of the ISWC 2017 In-Use track, is on the Editorial Board of the Journal of Web Semantics, member of multiple program committees (ISWC, ESWC, WWW, AAAI, IJCAI) and co-creator of the Consuming Linked Data Workshop series. Juan is a member of the Graph Query Languages task force of the Linked Data Benchmark Council (LDBC) and has also been an invited expert member and standards editor at the World Wide Web Consortium (W3C).
Juan will be giving two presentations: Integrating Semantic Web Technologies in the Real World: A journey between two cities and,G-CORE: A Core for Future Graph Query Languages, designed by the LDBC Graph Query Language Task Force

Denis Vrdoljak (SF Bay)

Denis Vrdoljak (Co-Founder and Managing Director at the Berkeley Data Science Group (BDSG)): Denis is a Berkeley trained Data Scientist and a Certified ScrumMaster (CSM), with a background in Project Management. He has experience working with a variety of data types-- from intelligence analysis to electronics QA to business analytics. In Data Science, his passion and current focus is in Machine Learning based Predictive Analytics and Network Graph Analysis. He holds a Master's in Data Science from the UC Berkeley and a Master's in International Affairs from Texas A&M.
Denis will co-present the following Graph Day session: Silicon Valley vs New York: Who has Better Data Scientists? (a knowledge graph).

Claudius Weinberger (Köln, Germany) @weinberger

Claudius Weinberger is the CEO and Co-founder of ArangoDB GmbH - the company behind identically named NoSQL multi-model database. Claudius has been a serial entrepreneur for the majority of his life. Together with his co-founder, he has been busy building databases for more than 20 years. He started with in-memory to mostly memory databases, moved to K/V stores, multi-dimensional cubes and ultimately graph databases. Throughout the years he focused mostly on product and project management, further sharpening his vision of the database market. He has co-founded ArangoDB in 2012. Claudius studied economics with business informatics as key aspect at the University of Cologne. He spends all his free time with his two little daughters, is a judo enthusiast and occasionally enjoys gardening.
Claudius will present the following Graph Day session: Fishing Graphs in a Hadoop Data Lake.

Ted Wilmes (Oklahoma City) @trwilmes

Ted Wilmes, Data Architect at Expero, is a graduate of Trinity University where he studied computer science and art history. He started his professional career at a not-for-profit research and development institution where he performed contract software development work for a variety of government and commercial clients. During this time he worked on everything from large enterprise systems to smaller, cutting edge research and development projects. One of the most rewarding parts of each of these projects was the time spent collaborating with the customer.
As Ted’s career continued, he moved on to an oil and gas startup and continued to dig deeper into the data side of software development, gaining an even deeper interest in how databases work and how to eek as much performance out of them as possible. During this time he became interested in the application of graph databases to certain problem sets. Today, at Expero, Ted enjoys putting his deep knowledge of transactional graph computing to work as he helps customers of all types navigate the burgeoning property graph database landscape.
Outside of work, Ted enjoys spending time with his family out-of-doors, listening to and playing loud music, and contributing to the Apache TinkerPop project as a committer and PMC member.
Ted will be giving the Graph Day keynote: The State of JanusGraph 2018