Data Day Texas

NLP Day at Data Day Texas

At Data Day Texas 2016, we launched the inaugural NLP Day Texas - a conference within a conference.
Based on community demand, we are expanding NLP Day for 2017. This is not a separate event.
Your registration for Data Day Texas includes the 20+ NLP sessions and workshops as well.

Below is a list of the currently confirmed NLP Day sessions. We are awaiting final abstracts from several of the confirmed speakers. For a complete list of NLP Day speakers, visit the Data Day Speakers page

NEW - How to Progress from NLP to Artificial Intelligence

Jonathan Mugan (DeepGrammar)

Why isn’t Siri smarter? Our computers currently have no commonsense understanding of our world. This deficiency forces them to rely on parlor tricks when processing natural language because our language evolved not to describe the world but rather to communicate the delta on top of our shared conception. The field of natural language processing has largely been about developing these parlor tricks into robust techniques. This talk will explain these techniques and discuss what needs to be done to move from tricks to understanding.

NEW - Exploring Question-Answering System: Named Entity Recognition & Sentence Similarity Measure in Practice

Jacob Su Wang - Ojo Labs
Named Entity Recognition (NER) and Sentence Similarity Measure (SSM) are two primary pillars in building a sound and successful Question-Answering System. In the current project at OJO Labs. Inc., we explore various prominent models proposed over the years, in an effort to evaluate their performance and practicality in “real life”, and demonstrate our results on the open dataset ATIS (Airline Travel Information System).
For the NER tasks, we specifically make use of Recurrent Neural Network (RNN) and Conditional Random Fields (CRF), which have been shown to produce state-of-the-art performance (Mesnil et al. 2013). We show that whereas RNN outperforms CRF with large amount of data under comparable training iterations, CRF makes a superior choice with its better performance in the setting of data scarcity and much simpler, more intuitive feature selection. We also retool a model – selectional preference strength (Erk 2007; Resnik 1996) in distributional semantics (Erk 2012, and elsewhere) to discover and evaluate various features’ predictive power.
For SSM, we implement and evalute several representative models (Li et al. 2006; Wan et al. 2006; Ji & Eisenstein 2013) with various ngram / word overlap based baselines, and apply the models to the popular benchmark Microsoft Paraphrase Corpus (MSRP, Dolan et al. 2004), where we found supporting evidence for the practical superiorty of a balanced combination of knowledge-based and syntax-semantic-based system.

NEW - Creating Knowledgebases from text in absence of training data

Sanghamitra Deb - Accenture
A major part of Big Data collected in most industries is in the form of unstructured text. Some examples are log files in IT sector, analysts reports in the finance sector, patents, laboratory notes and papers, etc. Some of the challenges of gaining insights from unstructured text is converting it into structured information and generating training sets for machine learning. Typically training sets for supervised learning are generated through the process of human annotation. In case of text this involves reading several thousands to million lines of texts by subject matter experts. This is very expensive and may not always be available, hence it is important to solve the problem of generating training sets before attempting to build machine learning models. Our approach is to combine rule based techniques with small amounts of SME time to by pass time consuming manual creation of training data. The end goal here is to create knowledgebases of
structured data which are used to derive insights on the domain. I have applied this technique to several domains, such as data from drug labels and medical journals, log data generated through customer interaction, generation of market research reports, etc. I will talk about the results in some of these domains and the advantage of using this approach.

Christopher Moody of Stitch Fix taking the audience in the weeds with recent NLP techniques at Data Day Texas 2016.

Speakers already confirmed for the upcoming NLP Day @ Data Day Texas 2017 are:
Michelle Casbon from Qordoba
Nick Gaylord, from Crowdflower
Stefan Krawczyk Stefan Krawczyk at Stitch Fix (@stitchfix)
Rob McDanielSenior Data Scientist at Live Stories
Gabor MelliSenior Data Scientist at Live Stories
Jana ThompsonR&D Technology Associate Principal at Accenture Labs (@accenturelabs)
(See the full list of confirmed Data Day speakers)

Charity Majors @mipsytipsy gave the keynote to a full house for Data Day 2016. This is what a sold out event looks like. Buy your tickets now and save money.

Visiting SF? Quite a few Data Day alumni joined us for the recent the Bay Area NLP Happy Hour. See you at the next one?

Jonathan Mugan, CEO of Deep Grammar, speaking at the recent NLP Community Day in Austin.
We host many NLP events in Austin throughout the year. Join the Austin Text / NLP Meetup to stay in the loop.

NLP @ Data Day Texas 2016 - Interviews

Rob Munro gave the NLP keynote at Data Day Texas 2016

Michelle Casbon found her first NLP job at Data Day Texas. She came back as a speaker for 2016.
Michelle is currently Director of Data Science at Qordoba. NLP Day is a great place to find your next gig.