Opening Keynote
Gwen Shapira (Cupertino, CA ) @gwenshap
Gwen Shapira (LinkedIn) is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data-processing pipelines using Apache Kafka. Gwen is an Oracle Ace director, the co-author of two O'Reilly books: Kafka: the definitive guide and Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.
Gwen will be presenting the Opening Keynote: Lies Enterprise Architects Tell
Data Science Keynote
Sean Owen (Austin) @sean_r_owen
Sean Owen (Quora / LinkedIn) is the Data Science Lead at Databricks. Previously, Sean was director of data science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Hadoop. Sean is an Apache Spark committer, was a committer and VP for Apache Mahout, and is the co-author of Advanced Analytics on Spark and Mahout in Action. Previously, Sean was a senior engineer at Google.
Sean will present the following session: Obvious conclusions that are actually wrong.
NLP Keynote
Robert Munro (San Francisco ) @WWRob
Robert Munro (LinkedIn) most recetly was Chief Technology Officer at Figure-Eight (formerly known as Crowdflower). Previously, he ran Product for AWS's first Natural Language Processing services in the Deep Learning team at Amazon AI. Robert is an expert in combining Human and Machine Intelligence, working with Machine Learning approaches to Text, Speech, Image and Video Processing. Robert has founded several AI companies, building some of the top teams in Artificial Intelligence. He has worked in many diverse environments, from Sierra Leone, Haiti and the Amazon, to London, Sydney and Silicon Valley, in organizations ranging from startups to the United Nations. Robert has published more than 50 papers, has a PhD from Stanford University.
Rob will present the following session: Transfer Learning Today: the Good, the Bad, and the Ugly of NLP in 2019
Data Engineering Keynote
Jesse Anderson (Reno) @jessetanderson
Jesse Anderson is a data engineer, creative engineer, and managing director of the Big Data Institute. He works with companies ranging from startups to Fortune 100 companies on Big Data. This includes training on cutting edge technologies like Apache Kafka, Apache Hadoop and Apache Spark. He has taught over 30,000 people the skills to become data engineers. He is widely regarded as an expert in the field and for his novel teaching practices. Jesse is published on O’Reilly and Pragmatic Programmers. He has been covered in prestigious publications such as The Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired. You can learn more about Jesse at Jesse-Anderson.com.
Jesse will present the following session: Creating a Data Engineering Culture
Graph Keynote
Dr. Denise Koessler Gosnell (Charleston) @DeniseKGosnell
Dr. Denise Gosnell leads a team at DataStax which builds some of the largest, distributed graph applications in the world. Her passion centers on examining, applying, and evangelizing the applications of graph data and complex graph problems. As an NSF Fellow, Dr. Gosnell earned her Ph.D. in Computer Science from the University of Tennessee. Her research coined the concept of "social fingerprinting" by applying graph algorithms to predict user identity from social media interactions. Since then, Dr. Gosnell has built, published, patented, and spoke on dozens of topics related to graph theory, graph algorithms, graph databases, and applications of graph data across all industry verticals.
Dr. Gosnell will be presenting the Graph Summit Keynote: From Theory to Production.
Jans Aasman (SF Bay)
Jans Aasman (Wikipedia / LinkedIn) is a Ph.D. psychologist and expert in Cognitive Science - as well as CEO of Franz Inc., an early innovator in Artificial Intelligence and provider of the graph database, AllegroGraph. As both a scientist and CEO, Dr. Aasman continues to break ground in the areas of Artificial Intelligence and Knowledge Graphs as he works hand-in- hand with numerous Fortune 500 organizations as well as US and Foreign governments. Jans recently authored an IEEE article on “Enterprise Knowledge Graphs”.
Dr. Aasman spent a large part of his professional life in telecommunications research, specializing in applied Artificial Intelligence projects and intelligent user interfaces. He gathered patents in the areas of speech technology, multimodal user interaction, recommendation engines while developing precursor technology for tablets and personal assistants. He was also a professor in the Industrial Design department of the Technical University of Delft. Dr. Aasman is a noted conference speaker at such events as Smart Data, NoSQL Now, International Semantic Web Conference, GeoWeb, AAAI, Enterprise Data World, Text Analytics, and TTI Vanguard to name a few.
Dr. Aasman will co-present the following Graph Summit session: The Intelligent Sales Organization Runs on Speech Recognition, Knowledge Graphs and AI.
Jon Allen (San Francisco)
Jon Allen is a Senior Data Scientist at SyncThink and a Founder of / Stand-up Comedian at Cheaper Than Therapy. Jon is a physicist who studied at UT Austin’s Center for Relativity. After leaving academia, Jon worked with start-ups from MIT’s Media Lab on automated gait analysis and, later, co-founded Ravel in 2010, which specialized in large scale data solutions for corporate marketing groups. Jon moved out to the Bay Area in 2012 and has worked extensively as a data scientist in the medical and hardware spaces. He also started, runs, and regularly performs in one of the largest independent comedy clubs in the US, Cheaper Than Therapy.
Jon will present the following session: The Role of Data Science
Roger Barga (Seattle)
Roger Barga is General Manager for the New Cloud Service Initiative at Amazon. Prior to that, Roger was general manager and director of development at Amazon Web Services, where he was responsible for Kinesis data streaming services. Previously, Roger was in the Cloud Machine Learning Group at Microsoft, where he was responsible for product management of the Azure Machine Learning service. Roger is also an affiliate professor at the University of Washington, where he is a lecturer in the Data Science and Machine Learning programs. Roger holds a PhD in computer science, has been granted over 30 patents, has published over 100 peer-reviewed technical papers and book chapters, and has authored a book on predictive analytics.
Roger will present the following session: Extracting Real-Time Insights from Streaming Data
Dave Bechberger (Houston) @bechbd
Dave Bechberger is a Solution Architect in the Graph Practice at DataStax where he helps customers build large distributed graph backed applications. Prior to that he was the Chief Architect at Gene by Gene, a genetic genealogy and bioinformatics company, where he worked to migrate their legacy technology stack to modern technologies including heavy use of graph databases and Cassandra. Dave has spent his career engaging in full stack software development but specializes in building data architectures in complex data domains such as bioinformatics, oil and gas, supply chain management, etc. He uses his knowledge of graph and other big data technologies to build out highly performant and scalable systems. Dave has previously spoken at a variety of national and international technical conferences including NDC Oslo, NDC London, as well as previous GraphDay conferences in Texas, San Francisco and Seattle.
Dave will present the following session: Intro to Graph Databases for Data Scientists
Tim Berglund is a teacher, author, and technology leader with Confluent, where he serves as the senior director of developer experience. Tim can frequently be found at speaking at conferences internationally and in the United States. He is the co-presenter of various O’Reilly training videos on topics ranging from Git to distributed systems and is the author of Gradle Beyond the Basics. He tweets as @tlberglund, blogs very occasionally at timberglund.com, and is the cohost of the DevRel Radio Podcast. He lives in Littleton, Colorado, with the wife of his youth and their youngest child, the other two having mostly grown up.
Tim will present the following session: Dissolving the Problem: Kafka is more ACID Than Your Database
and will also co-present the following session: Six Things You Need to Know about Cassandra and Kafka.
Michael Berthold (Konstanz)
Michael Berthold is currently president of KNIME.com AG and co-creator of KNIME (wikipedia entry), the open analytics platform used by thousands of data experts around the world. Since August 2003, Michael has been the Nycomed-Chair for Bioinformatics and Information Mining at Konstanz University, Germany where his research focuses on using machine learning methods for the interactive analysis of large information repositories in the Life Sciences. Previously he held positions in both academia (Carnegie Mellon, UC Berkeley) and industry (Intel, Tripos).
Michael is Past President of the North American Fuzzy Information Processing Society, Associate Editor of several journals and the President of the IEEE System, Man, and Cybernetics Society. He has been involved in the organization of various conferences, most notably the IDA-series of symposia on Intelligent Data Analysis and the conference series on Computational Life Science. Together with David Hand he co-edited the textbook Intelligent Data Analysis: An Introduction which has recently appeared in a completely revised, second edition. He is also co-author of Guide to Intelligent Data Analysis (Springer Verlag) which appeared in summer 2010. When time permits Michael still writes code.
Michael will present the following session: Data Science Automation: Facts & Fiction
Ryan Boyd (SF Bay) @ryguyrg
Ryan Boyd (Linkedin) is a SF-based software engineer at Neo4j focused on helping developers understand the power of graph databases. Previously he was a product manager for architectural software, built applications and web hosting environments for higher education, and worked in developer relations for twenty products during his 8 years at Google. He enjoys cycling, sailing, skydiving, and many other adventures when not in front of his computer.
Ryan will present the following session: Data Science Tools: Cypher for Data Munging
Michelle Casbon (San Francisco) @texasmichelle
Michelle Casbon is a senior engineer on the Google Cloud Platform developer relations team, where she focuses on open source contributions and community engagement for machine learning and big data tools. Michelle’s development experience spans more than a decade and has primarily focused on multilingual natural language processing, system architecture and integration, and continuous delivery pipelines for machine learning applications. Previously, she was a senior engineer and director of data science at several San Francisco-based startups, building and shipping machine learning products on distributed platforms using both AWS and GCP. She especially loves working with open source projects and has contributed to Apache Spark and Apache Flume. Her writing has been featured in the AI section of O’Reilly Radar. Michelle holds a master’s degree from the University of Cambridge.
Michelle will present the following session: Kubeflow explained: Portable machine learning on Kubernetes.
Dr. Artem Chebotko (Houston) @artemchebotko
Dr. Artem Chebotko is a Solutions Architect at DataStax. His core expertise is in data modeling, data management, data mining, and data analytics. For over 15 years, he has been leading and participating in research and development projects on NoSQL, Graph, XML, Relational, and Provenance databases. He is the inventor of the Big Data Modeling Methodology for Apache Cassandra and the author of over 50 research and technical papers published in international journals and conference proceedings. He is an educator with extensive experience in both industry and academic training.
Artem will present the following graph workshop: Hands-On Introduction to Gremlin Traversals
Shannon Copeland (Atlanta)
Shannon Copeland has more than 20 years’ experience driving change and innovation in the services industry. As Chief Operating Officer of N3, Shannon is responsible for overseeing all business operations while growing the company’s profitability and cash flow. Shannon also leads N3’s innovative efforts to develop unique machine learning tools to drive operating efficiency and client results.
Before joining N3, Shannon served as a Director for Huron Consulting and as the Director of Strategy and Group COO of a global law firm leading strategy and operations throughout 18 offices in the United States, Europe, and the Middle East. Prior to that, Shannon held positions at Chevron, Deloitte, and Georgia-Pacific.
Shannon holds a bachelor’s degree in Civil Engineering from the Georgia Institute of Technology and an MBA from the Wharton School.
Shannon will co-present the following Graph Summit session: The Intelligent Sales Organization Runs on Speech Recognition, Knowledge Graphs and AI.
Sanghamitra Deb (SF Bay) @sangha_deb
Sanghamitra Deb is a Data Scientist at Chegg, where she works on problems related school and college education to sustain and improve the learning process. Her work involves recommendation systems, graph modeling, deep NLP analysis , data pipelines and machine learning. Previously, Sanghamitra was a data scientist at a Accenture where she worked on a wide variety of problems related data modeling, architecture and visual story telling. Sanghamitra is active in Data Science outreach and believes in applying analytics to a range of domains such as pharma, HR, customer support, market research, etc. Prior to being data scientist she was an astrophysicist who studied the structure of the universe by modeling galaxy clusters.
Sanghamitra will present the following session: Using weak supervision and transfer learning techniques to build knowledge graph to improve student experiences at Chegg.
Joey Echeverria (SF Bay) @fwiffo
Joey Echeverria is the platform technical lead at Splunk, where he builds applications for scaling IT operations built on the Apache Hadoop platform. Joey is a committer on the Kite SDK, an Apache-licensed data API for the Hadoop ecosystem. Joey was previously a software engineer at Cloudera, where contributed to several ASF projects including Apache Flume, Apache Sqoop, Apache Hadoop, and Apache HBase. Joey is also a coauthor of Hadoop Security, published by O'Reilly Media.
Leanne Fitzpatrick (Manchester) @lk_fitzpatrick
Leanne Fitzpatrick (Linkedin / GitHub) is a passionate data leader with experience developing, implementing and growing a data science and analytics function within a start-up business, whilst being technically hands-on. Leanne possesses a deep and innovative passion for all things data, proactively being a part of the Manchester and beyond data communities. She is an Advisory Panel board member at the University of Sheffield Information School. She has recently been shortlisted for Data Scientist of the Year at the 2018 Women in Tech Awards. Aside from her enthusiasm for making data concepts simple to understand, in her spare time Leanne likes to enjoy great food and music, is a keen American Football fan and is attempting to get into golf. She is looking forward to becoming a member of the Austin data and tech communities.
Leanne will present the following session: Shipping a Machine Learning model to Production; is it always smooth sailing?
Michael Freedman (Linkedin) is the cofounder and CTO of TimescaleDB, an open source database that scales SQL for time series data, and a professor of computer science at Princeton University, where his research focuses on distributed systems, networking, and security. Previously, Michael developed CoralCDN (a decentralized CDN serving millions of daily users) and Ethane (the basis for OpenFlow and software-defined networking) and cofounded Illuminics Systems (acquired by Quova, now part of Neustar). He is a technical advisor to Blockstack. Michael’s honors include the Presidential Early Career Award for Scientists and Engineers (PECASE, given by President Obama), the SIGCOMM Test of Time Award, the Caspar Bowden Award for Privacy Enhancing Technologies, a Sloan Fellowship, the NSF CAREER Award, the Office of Naval Research Young Investigator Award, a DARPA Computer Science Study Group membership, and multiple award publications. He holds a PhD in computer science from NYU’s Courant Institute and bachelor’s and master’s degrees from MIT.
Michael will present the following session: Performant time-series data management and analytics with Postgres.
Jeffrey Gleason (Austin)
Jeffrey Gleason is a Jr. Machine Learning Engineer at New Knowledge, where he focuses on natural language processing, time series forecasting and classification and algorithmic fairness. He holds a degree in Computer Science from Princeton University, where his research focused on algorithmic fairness in the criminal justice system.
Jeffrey will be co-presenting the following session: Data Science: Avengers Style - how to build a data science team that fights bad guys (on the internet).
Michael Hackstein (Köln, Germany) @mchacki
Michael Hackstein is a Senior Graph Specialist @ ArangoDB Inc. He holds a Master’s degree in Computer Science and is the creator of ArangoDBs graph capabilities. During his academic career he focused on complex algorithms and especially graph databases. Michael is an internationally experienced speaker who loves salad, cake and clean code.
Michael will present the following session: A Scalable Graph Database Platform with ArangoDB on Kubernetes.
Jon Haddad (Linkedin) is the Principal Consultant at The Last Pickle, as well as a committer and PMC member for Apache Cassandra. Prior to The Last Pickle, Jon was a technical evangelist at DataStax. He has worked on dozens of Cassandra clusters across a wide variety of hardware, both on-prem and in the cloud. Jon has contributed to a wide variety of open source projects and has almost 20 years experience in the field.
Jon will present the following session: 10 Easy Ways to Tune Your Cassandra Cluster
Kristian Hammond (Chicago) @kj_hammond
Kristian Hammond (LinkedIn) is chief scientist at Narrative Science and professor of computer science and journalism at Northwestern University. Previously, Kris founded the University of Chicago’s Artificial Intelligence Laboratory. His research has been primarily focused on artificial intelligence, machine-generated content, and context-driven information systems. He currently sits on a United Nations policy committee run by the United Nations Institute for Disarmament Research (UNIDIR). Kris was also named 2014 innovator of the year by the Best in Biz Awards. He holds a PhD from Yale.
Kristian will be presenting the following session:
The Rise of the New AI: The Relationship between the Growth of Data and the AI of Today.
Álvaro Hernández (Madrid)
Álvaro Hernández is a passionate database and software developer. He has been an almost exclusive user of PostgreSQL, as THE database, for more than 15 years. Founder of OnGres, he is dedicated to R&D in databases. Álvaro founded ToroDB, previously the “Billion Tables Project” and keeps working on innovative open source solutions for databases, specially PostgreSQL. On his free time he also contributes to open source and PostgreSQL, like SCRAM's support in PostgreSQL's JDBC driver.
Kristian will be presenting the following session: Why PostgreSQL? PostgreSQL 10 coolest features.
He is a frequent speaker at PostgreSQL, database and Java conferences. Álvaro created the Spanish PostgreSQL user group, one of the largest in the world, with 750+ members.
Amy Hodler (Kettle Falls, Washington) @amyhodler
Amy Hodler is a network science devotee and AI and Graph Analytics Program Manager at Neo4j. She promotes the use of graph analytics to reveal structures within real-world networks and predict dynamic behavior. She is the co-author of the O’Reilly book, Graph Algorithms: Practical Examples in Apache Spark and Neo4j. Amy helps teams apply novel approaches to generate new opportunities at companies such as EDS, Microsoft, Hewlett-Packard (HP), Hitachi IoT, and Cray Inc. Amy has a love for science and art with a fascination for complexity studies and graph theory.
Community Detection using Graph Algorithms and Neo4j.
Holden Karau (San Francisco) @holdenkarau
Holden Karau is a transgender Canadian, Apache Spark committer, an active open source contributor, and co-author of Learning Spark & High Performance Spark. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden talks internationally on Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of computers she enjoys dancing & playing with fire.
Holden will be presenting the following session: Understanding Spark Tuning with Auto Tuning (or how to stop your pager going off at 2am*).
Dinesh A. Joshi (SF Bay) @dineshjoshi
Dinesh A. Joshi (Linkedin) has been a professional Software Engineer for over a decade building highly scalable realtime Web Services and Distributed Streaming Data Processing Architectures serving over 1 billion devices. Dinesh is an active contributor to the Apache Cassandra codebase. He has a Masters degree in Computer Science (Distributed Systems & Databases) from Georgia Tech, Atlanta, USA.
Dinesh will be presenting the following session: Need for speed: Boosting Apache Cassandra's performance using Netty.
Sanjay Joshi (Seattle)
Sanjay Joshi (Linkedin) is the Industry CTO, Healthcare at Dell EMC. Based in Seattle, Sanjay's career has spanned the entire gamut of life-sciences and healthcare from clinical and biotechnology research to healthcare informatics to medical devices.
A "skunkworks" engineer, bioengineer and informaticist, he defines himself as a "non-reductionist" with a "systems view of the world.” His current focus is a systems-level understanding of Healthcare from the Edge to the Cloud via Genomics, Proteomics, Microbiomics, Imaging and IoT processes and data infrastructures.
Recent experience has included AI platforms, data management and instruments for Electronic Medical Records; Proteomics and Flow Cytometry; FDA and HIPAA validations; Lab Information Management Systems (LIMS); Translational Genomics research and Imaging. Sanjay holds a patent in multi-dimensional flow cytometry analytics. He began his career developing and building X-Ray machines. Sanjay was the recipient of a National Institutes of Health (NIH) Small Business Innovation Research (SBIR) grant and has been a consultant or co-Principal-Investigator on several NIH grants. He is actively involved in non-profit biotech networking and educational organizations in the Seattle area and beyond. Sanjay holds a Master of Biomedical Engineering from the University of New South Wales, Sydney and a Bachelor of Instrumentation Technology from Bangalore University. He completed several medical school and PhD level courses (in Sydney and Seattle).
Sanjay will be presenting the following session: Morals from a Type 2 Diabetes dataset analytics journey....
Mayank Kejriwal is a research scientist and lecturer at the University of Southern California's Information Sciences Institute (ISI). He received his Ph.D. from the University of Texas at Austin under Daniel P. Miranker. His dissertation involved Web-scale data linking, and in addition to being published as a book, was recently recognized with an international Best Dissertation award in his field. Some of his projects at ISI, all funded by either DARPA or IARPA, include: automatically extracting information from large Web corpora and building search engines over them (the topic of his talk); 'automating' a data scientist with advanced meta-learning techniques; representing, and reasoning over, terabyte-scale knowledge graphs; combining structured and unstructured data for causal inference; constructing, embedding and analyzing networks over billion-tweet scale social media; and building a platform that makes research easy for geopolitical forecasters. His research sits at the intersection of knowledge graphs, social networks, Web semantics, network science, data integration and AI for social good. He is currently co-authoring a textbook on knowledge graphs (MIT Press, 2018), and has delivered tutorials and demonstrations at numerous conferences and venues, including KDD, AAAI, ISWC and WWW.
Mayank will be giving the following presentation: Let's embed everything!
David Kjerrumgaard (Henderson, Nevada)
David Kjerrumgaard is a Director of Solution Architecture at Streamlio, and also a contributor to the Apache NiFi, and Apache Pulsar projects. He was formerly the Practice Director at Hortonworks, where he was responsible for the development of best practices and solutions for the professional services team, with a focus on HDF-related technologies including Kafka, NiFi, and Storm. He is a co-author of “Practical Hive: A Guide to Hadoop’s Data Warehouse System”, and holds a B.S and Master’s Degree in Computer Science from Kent State University.
David will co-present the following session: Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Corey Lanum (LinkedIn), has a distinguished background in graph visualization. Over the last 15 years he has managed technical and business relationships with dozens of the largest defense and intelligence agencies in North America, in addition to working with many security and anti-fraud organizations in private industry. Prior to joining Cambridge Intelligence as their US Manager, Corey was helping the customers of i2 (now IBM) and SS8 to solve their most complex graph data challenges.
Corey is the author of Visualizing Graph Data from Manning Publications.
Cory will present the following sessions:
How to Destroy Your Graph Project with Terrible Visualization
Build a Visualization Application in Real Time
Chris Lu (SFBay)
Chris Lu (Linkedin / Github) is a lead engineer at Uber on building the knowledge graph, leveraging nearly twenty years of experience in big and small companies on databases, federated query, search, warehouse, and building infra for machine learning on graphs. His side projects include the SeaweedFS distributed file system, Gleam and Glow for distributed MapReduce with Golang, and DBSight for database search.
Chris will be presenting the knowledge graph session: Statistically representative graph generation and benchmarking.
William Lyon (SFBay) @lyonwj
William Lyon is a software developer at Neo4j, the open source graph database. As an engineer on the Developer Relations team, he works primarily on integrating Neo4j with other technologies, building demo apps, helping other developers build applications with Neo4j, and writing documentation. Prior to joining Neo, William worked as a software developer for several startups in the real estate software, quantitative finance, and predictive API fields. William holds a Masters degree in Computer Science from the University of Montana. You can find him online at lyonwj.com.
William will present the following graph analytics session: Operationalizing Graph Analytics With Neo4j.
Patrick McFadin (Linkedin) is the VP of Developer Relations at DataStax, where he leads a team devoted to making users of DataStax products successful. He has also worked as Chief Evangelist for Apache Cassandra and consultant for DataStax, where he helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.
Patrick will be co-presenting the following session: Six Things You Need to Know about Cassandra and Kafka.
Leo Meyerovich (SFBay) @lmeyerov
Leo Meyerovich founded Graphistry to supercharge visual investigations. Graphistry connects browsers to GPUs and eliminates query writing, which build on his work at UC Berkeley. His most referenced work is in language-based security. Past projects include the first functional reactive web language, first parallel web browser, GPU visual analytics, and sociological foundations of programming languages. These ideas received various research awards and are now in popular browsers and frameworks.
Leo will present the following graph session: Using GPUs & Design to Scale Visual Analysis of Digital Crime.
Jonathon Morgan (Linkedin) is Founder and CEO at New Knowledge. a company building technologies to understand and predict human behavior. As part of his ongoing work applying quantitative methods to combating violent extremism, he served as an advisor to the White House and State Department, co-authored the ISIS Twitter Census for the Brookings Institution, and develops new technology with DARPA. Jonathon is also the co-host of Partially Derivative, an unrealistically popular podcast about data science and drinking.
Jonathon will be co-presenting the following session: Data Science: Avengers Style - how to build a data science team that fights bad guys (on the internet).
Jonathan Mugan (Austin) @jmugan
Jonathan Mugan (Linkedin) is a researcher specializing in artificial intelligence, machine learning, and natural language processing. His current research focuses in the area of deep learning for natural language generation and understanding. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin. His thesis was centered in developmental robotics, which is an area of research that seeks to understand how robots can learn about the world in the same way that human children do. Dr. Mugan also held a post-doctoral position at Carnegie Mellon University, where he worked at the intersection of machine learning and human-computer interaction. One of the most requested speakers at the Data Day Texas conferences, he recently also spoke on the topic of NLP at the O’Reilly AI conference, and is the creator of the O’Reilly video course Natural Language Text Processing with Python. Dr. Mugan is also the author of The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion.
Jonathan will be presenting the following session: How to Progress from NLP to Artificial Intelligence
Benjamin Ortiz Ulloa (Washington, D.C.)
Benjamin Ortiz Ulloa is a data visualization engineer who works primarily with open source technology such as R and D3. He is an active member of Data Community DC and co-organizes the Data Visualization DC meetup. He is interested in complex systems. In particular, he is interested in urban networks and education/school systems.
Ben will present the following session: Exploring Graphs with R and igraph.
As head of the Design team at Expero, co-principal and business strategist, Lynn Pausic takes multitasking to the next level. By combining expertise in strategy, innovation and design, Lynn brings the breadth and depth of complex problems to light and figures out how to break them down into useful, usable and manageable pieces that form a holistic experience.
Lynn’s extensive background in user experience ranges from designing user interfaces for wearable devices, to creating enterprise software solutions and mobile UIs, to innovating scenarios beyond the 2D screen. She has ever-growing expertise with timely topics such as Big Data, the Internet of Things, UI Design Pattern Libraries and High-Performance Computing, in industries as varied and diverse as Austin itself. Lynn’s recent clients are in agronomy, enterprise management, energy, biotechnology and other verticals.
Prior to founding Expero, Lynn earned a B.S. from Carnegie Mellon University and worked as a Director of Product Management, a Consulting Manager and a Director of Human-Computer Interaction (HCI). At Trilogy, she led the HCI team and established user-centered design as an integral part of the company’s software development process.
Lynn often speaks on user experience and design, including at Nielsen Norman Group conferences around the world. Lynn created the popular tutorial “Complex Applications & Websites” (which she co-presents with John Morkes). Lynn also has presented at Carnegie Mellon University’s HCI Institute, Cornell University’s Media Lab and ACM’s SIGCHI conference.
Lynn will present the following session: Moving Beyond Node Views.
Josh Perryman (Bryan / College Station) @joshperryman
Josh Perryman likes to play with data. Oftentimes this is implementing proprietary algorithms closer to the data for performance or scale. Sometimes it is ad-hoc investigation and analysis, a sort of exploratory querying. A few times he’s been able to leverage his experience with data engines for dramatic performance improvements. But the real joy is designing a schema for both functionality and performance, one which increases the productivity of other developers and enables a technology to solve new problems or deliver new value to the business.
Technology isn't just data, and Josh does more than just play with data. He’s worked with high performance computing (HPC) environments, taking computations from hours to minutes or seconds. He has built visualizations which deliver new insights into complex data domains. He’s managed technology personnel, both directly and indirectly, to deliver technology solutions. Josh has put together more types of technology components, software and hardware, than can be counted, because one of his fortes is solving problems by building sustainable systems.
Josh will present the following session: And Bad Mistakes, I’ve Made a Few: Experience from the Trenches as a Graph Data Architect.
Davin Potts (Austin)
Davin Potts spends working hours as a scientific software consultant at Appliomics, volunteers as a Python core developer, and along the way attempts to make open source data science tools (not only but especially KNIME) ever more productive and useful. After spending a decade in the pharmaceutical drug discovery R&D industry, Davin has previously held positions as Chief Data Scientist at Continuum Analytics (now Anaconda) based in Austin, Chief Science Officer at Stipple based in San Francisco, and co-founder at KNIME.com based in Zurich. Davin has attended every Data Day Texas held to date.
Davin will present the following session: Choosing Sides When Choosing Tools Hurts.
Karthik Ramasamy (San Francisco) @karthikz
Karthik Ramasamy (LinkedIn) is the co-founder of Streamlio - a company that focuses on building next generation real time infrastructure. Before Streamlio, Karthik was the engineering manager and technical lead for real-time infrastructure at Twitter where he co-created Twitter Heron. He has two decades of experience working in parallel databases, big data infrastructure, and networking. He co-founded Locomatix, a company that specializes in real-time streaming processing on Hadoop and Cassandra using SQL, that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum, where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks, where he designed and delivered platforms, protocols, databases, and high availability solutions for network routers that are widely deployed on the internet. Before joining Juniper, at the University of Wisconsin he worked extensively in parallel database systems, query processing, scale out technologies, storage engines, and online analytical systems. Several of these research projects were later spun off as a company acquired by Teradata. Karthik is the author of several publications, patents, and Network Routing: Algorithms, Protocols and Architectures. He has a Ph.D. in computer science from the University of Wisconsin, Madison with a focus on big data and databases.
Karthik will co-present the following session: Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
and the following session: Apache Pulsar: Next Generation Cloud Native Messaging Streaming System
Steve Sarsfield, VP of Product at Cambridge Semantics / AnzoGraph, is a long time industry expert with experience at Talend, Vertica and now Cambridge Semantics. He is also author of the book The Data Governance Imperative. Steve has more than 20 years of experience in databases, analytics, information quality, big data and data governance.
Steve will present the following session: Benchmarking a Graph OLAP database to complement OLTP systems.
Dr. Petra Selmer is a member of the Query Languages Standards and Research group at Neo4j, undertaking research into graph query languages and language standards, with the aim of evolving and standardizing property graph querying. She also supports the openCypher project at www.opencypher.org, and was previously part of the team designing and optimizing Neo4j’s Cypher query engine. For many years, she worked as a consultant and developer in a variety of different domains and roles and has a PhD in Computer Science from Birkbeck, University of London, where she researched flexible querying of graph-structured data.
Dr. Selmer will present the following session: GQL: Towards a Standardized Property Graph Query Language.
Dr. Juan Sequeda is the co-founder of Capsenta, a spin-off from his research, and the Senior Director of Capsenta Labs. He holds a PhD in Computer Science from the University of Texas at Austin. His research interests are on the intersection of Logic and Data and in particular between the Semantic Web and Relational Databases for data integration, ontology based data access and semantic/graph data management. Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at the 2014 International Semantic Web Conference and the 2015 Best Transfer and Innovation Project awarded by Institute for Applied Informatics. Juan is the General Chair of AMW 2018, was the PC chair of the ISWC 2017 In-Use track, is on the Editorial Board of the Journal of Web Semantics, member of multiple program committees (ISWC, ESWC, WWW, AAAI, IJCAI) and co-creator of the Consuming Linked Data Workshop series. Juan is a member of the Graph Query Languages task force of the Linked Data Benchmark Council (LDBC) and has also been an invited expert member and standards editor at the World Wide Web Consortium (W3C).
Juan will present the following sessions:
Building Enterprise Knowledge Graphs: Lessons Learned from the Trenches, and
Design Knowledge Graphs Simply: An Introduction to Gra.fo
Joshua Shinavier (San Francisco) @joshsh
Joshua Shinavier is a primordial being of the graph database domain, and holds a PhD in Web science from RPI’s Tetherless World Constellation. He contributed to the first common APIs for graph databases, the original TinkerPop query language which influenced Gremlin, and the first tools which aligned the property graph and RDF data models, starting with neo4j-rdf-sail in 2008. Other graphy adventures have include Lisp hacking at Franz Inc. and Java hacking at Aurelius. As of 2017, he is part of the knowledge graph team at Uber, where he also leads a company-wide effort to unify schemas across RPC, streaming, and storage. He feels, now as ever, that the research, business, and open source communities have a lot to learn from each other with respect to graphs and knowledge representation.
Josh will present the following session: A Graph is a Graph is a Graph: Equivalence, Transformations, and Composition of Graph Data Models.
Gabriel Tanase (Austin)
Gabriel Tanase is the Director of the Computing platform at Graphen, a novel startup that combines graph and machine learning to provide better insight into a company’s data. Prior to Graphen he was a Research Staff Member at IBM T.J. Watson Research Center where he worked on systems for large scale graph analytics and run time systems for parallel programming languages. He graduated with a PhD in Computer Science from Texas A&M University. His PhD work is on parallel data structures in the context of a C++ parallel programming library called STAPL. He received his Bachelor of Science from the Polytechnic University of Bucharest, Romania in 1999 and Master of Science from the same University in 2000. His research interests are in the area of graph databases, high performance computing, parallel programming languages and libraries, parallel algorithms and generic programming.
Gabriel will present the following session: Predicting new edges in large scale dynamic graphs.
Dominik Tomicevic (London) @dtomicevic
Dominik Tomicevic received his bachelor's degree in computer science from the University of Zagreb. Throughout the years, he has participated in various regional data science competitions and accumulated a number of awards. In 2011, he was selected as one of the 4 people in the world, to receive the Microsoft Imagine Cup Grant, awarded by Bill Gates, for developing game-changing technologies in data processing. In 2016, he founded Memgraph, a venture-backed graph database company focusing on high-performance real-time connected data processing. Memgraph is a Techstars company, backed by some of the top technology investors and entrepreneurs in the UK and the US. In 2017, Dominik was named, by Forbes, as one of the top 10 Technology CEOs in the UK to watch in 2017.
Dominik will present the following session: Introduction to Memgraph.
Michael Uschold (Seattle, WA ) @UscholdM
Michael Uschold, Senior Ontology Consultant at Semantic Arts, has over twenty-five years’ experience in developing and transitioning semantic technology from academia to industry. He pioneered the field of ontology engineering, co-authoring the first paper and giving the first tutorial on the topic in 1995 in the UK.
As a senior ontology consultant at Semantic Arts since October 2010, Michael trains and guides clients to better understand and leverage semantic technology using knowledge graphs. He has built commercial enterprise ontologies in digital asset management, finance, healthcare, legal research, consumer products, electrical devices, manufacturing and corporation registration. More recently he has focused on semantic application development using SPARQL for application code and R2RML for converting relational data into a knowledge graph.
During 2008-2009, Uschold worked at Reinvent on a team that developed a semantic advertising platform that substantially increased revenue. As a research scientist at Boeing from 1997-2008 he defined, led and participated in numerous projects applying semantic technology to enterprise challenges. He is a frequent invited speaker and panelist at national and international events, and serves on the editorial board of the Applied Ontology Journal. He received his Ph.D. in AI from Edinburgh University in 1991 and an MSc. from Rutgers University in Computer Science in 1982.
Michael will present the following session: Breaking Down Silos with Knowledge Graphs.
Ted Wilmes (Oklahoma City) @trwilmes
Ted Wilmes, Data Architect at Expero, is a graduate of Trinity University where he studied computer science and art history. He started his professional career at a not-for-profit research and development institution where he performed contract software development work for a variety of government and commercial clients. During this time he worked on everything from large enterprise systems to smaller, cutting edge research and development projects. One of the most rewarding parts of each of these projects was the time spent collaborating with the customer.
As Ted’s career continued, he moved on to an oil and gas startup and continued to dig deeper into the data side of software development, gaining an even deeper interest in how databases work and how to eek as much performance out of them as possible. During this time he became interested in the application of graph databases to certain problem sets. Today, at Expero, Ted enjoys putting his deep knowledge of transactional graph computing to work as he helps customers of all types navigate the burgeoning property graph database landscape.
Outside of work, Ted enjoys spending time with his family out-of-doors, listening to and playing loud music, and contributing to the Apache TinkerPop project as a committer and PMC member.
Ted will present the following session: High Performance JanusGraph Batch & Stream Loading.
Chris Wixon (Atlanta)
Chris Wixon, MD, is a practicing surgeon who holds the belief that improving the quality and efficiency of clinical information has the potential to have a profound effect on healthcare delivery, costs and overall outcomes.
Chris will present the following session: Taming of the Shrew: Using a Knowledge Graph to capture structured Health Information Data.
Dr. Mingxi Wu (Redwood City)
Dr. Mingxi Wu is the VP of Engineering at TigerGraph responsible for product development, quality assurance, and release. Mingxi excels at engineering team building and tech leadership. Previously, he worked at Ad-Tech startup Turn (acquired by Amobee), Oracle Relational Database Optimizer Group, and Microsoft SQL Server Manageability Group. He won research awards from SIGMOD, VLDB and KDD and holds patents on big data and pending patents on graph management. Mingxi received his PhD from the University of Florida, where he specialized in database and data mining.
Mingxi will present the following session: Eight Prerequisites of a Graph Query Language.
Barry Zane (San Diego)
Barry Zane is VP of Engineering at Cambridge Semantics. He brings substantial product development experience and industry expertise in building large-scale products for data analysis. Prior to Cambridge Semantics, Barry was co-founder and CEO of SPARQL City where he served as VP of Technology, whose high performance scalable graph database technology has been acquired by Cambridge Semantics and integrated within its Smart Data Lake and other offerings. Previously, Barry was co-founder and CTO of Paraccel, a high performance scalable relational database system which provides the basis for Amazon Redshift. Paraccel was acquired by Actian Corporation as the Matrix product line. He was a co-founder and VP of Technology & Architecture at Netezza, which after a successful IPO, was acquired by IBM. Before Netezza, Barry was CTO of Applix, Inc. Applix was also later acquired by IBM. Barry began his career at Prime Computer, as a hardware engineer and ultimately various roles in software development and management. Barry holds a degree in Electrical Engineering from Carnegie Mellon University.
Barry will present the following session: Choosing the Right Graph Architecture for Your Use-Case - Operations vs. Analytics.
Adam Zegelin (San Jose)
Adam Zegelin is the Senior VP of Engineering and Co-Founder of Instaclustr.
Adam will present the following session: Broad-Use Analytics on Big Graphs.
Tom Zeppenfeldt (Netherlands)
Tom Zeppenfeldt is the founder of Graphileon, supplier of a graph-based application development platform. After a career in international development, a sector in which information requirements change rapidly, he started developing tools that allowed non-developers to explore data sets and build flexible applications.
Tom will present the following session: Your application is a graph too!.