The Data Day Texas 2020 Sessions

We're just now beginning to announce the sessions for Data Day Texas 2020. Check back frequently for updates.

Creating Explainable AI with Rules

Jans Aasman - Franz. Inc

This talk is based on Jans' recent article for Forbes magazine.
"There’s a fascinating dichotomy in artificial intelligence between statistics and rules, machine learning and expert systems. Newcomers to artificial intelligence (AI) regard machine learning as innately superior to brittle rules-based systems, while the history of this field reveals both rules and probabilistic learning are integral components of AI.
This fact is perhaps nowhere truer than in establishing explainable AI, which is central to the long-term business value of AI front-office use cases."
"The fundamental necessity for explainable AI spans regulatory compliance, fairness, transparency, ethics and lack of bias -- although this is not a complete list. For example, the effectiveness of counteracting financial crimes and increasing revenues from advanced machine learning predictions in financial services could be greatly enhanced by deploying more accurate deep learning models. But all of this would be arduous to explain to regulators. Translating those results into explainable rules is the basis for more widespread AI deployments producing a more meaningful impact on society."

Moving Your Machine Learning Models to Production with TensorFlow Extended

Jonathan Mugan - DeUmbra

ML is great fun, but now we want it to solve real problems. To do this, we need a way of keeping track of all of our data and models, and we need to know when our models fail and why. This talk will cover how to move ML to production with TensorFlow Extended (TFX). TFX is used by Google internally for machine-learning model development and deployment, and it has recently been made public. TFX consists of multiple pipeline elements and associated components, and this talk will cover them all, but three elements are particularly interesting: TensorFlow Data Validation, TensorFlow Model Analysis, and the What-If Tool.

The TensorFlow Data Validation library analyses incoming data and computes distributions over the feature values. This can show us which features many not be useful, maybe because they always have the same value, or which features may contain bugs. TensorFlow Model Analysis allows us to understand how well our data performs on different slices of the data. For example, we may find that our predictive models are more accurate for events that happen on Tuesdays, and such knowledge can be used to help us better understand our data and our business. The What-If Tool is as an interactive tool that allows you to change data and see what the model would say if a particular record had a particular feature value. It lets you probe your model, and it can automatically find the closest record with a different predicted label, which allows you to learn what the model is homing in on. Machine learning is growing up.

Automated Encoding of Knowledge from Unstructured Natural Language Text into a Graph Database

Chris Davis - Lymba

Most contemporary data analysts are familiar with mapping knowledge onto tabular data models like spreadsheets or relational databases. However, these models are sometimes too broad to capture subtle relationships between granular concepts among different records. Graph databases provide this conceptual granularity, but they typically require that knowledge is curated and formatted by subject matter experts, which is extremely time- and labor-intensive. This presentation presents an approach to automate the conversion of natural language text into a structured RDF graph database.

NuGraphStore: a Transactional Graph Store Backend for JanusGraph


Dr. Jun Li / Dr. Mohammad Roohitavaf / Dr. Gene Zhang - eBay

JanusGraph is a distributed graph database system with pluggable storage backend servers, such as Cassandra, HBase, or BerkeleyDB (which is non-scale-out). There were no fully transactional scale-out backends for JanusGraph. Without transaction support, there would be challenges for applications to deal with index/data inconsistency, and inconsistency related to vertices and edges, such as dangling edges, as well as data loss or data duplication. We have been developing a scale-out KCV storage engine with distributed transaction support for JanusGraph, called NuGraphStore. In this talk, we will present the architecture and design of NuGraphStore, its storage engine and distributed transaction mechanisms. NuGraphStore is (going to be) open-sourced under Apache 2.0 license. We invite interested developers and users to join the community to make NuGraphStore the best backend storage engine for JanusGraph. Its distributed transaction protocol could be adapted for use with other KV store engines as well.