Nov 29, 2016 - Survey on knowledge representation and reasoning based on statistical machine learning

Comments

Survey on knowledge representation and reasoning based on statistical machine learning

对于知识图谱,大家的处理切入点有所差异, 多数都是基于应用的深度学习训练。见以下Tasks。

Jason等在13, 14年在google时的knowledge base体系基于embedding(entities,relations are all dense vectors) 一个理论框架下的模型(TransfE)与应用(QA, RE).

Translating Embeddings for Modeling Multi-relational Data(theory)给出了TransfE 用于学习 entities 和 relations的向量表示,他们的工作是类似energy based embedding,基本思想是三元组(e1,r,e2)构成 e1+r~=e2, 而非基于word2vec。 他的当年另外两篇文章分别细致化了关系抽取和问答系统。 值得一提的是他们的关系抽取可以对给定的两个entities正确推理出训练集中从未出现的关系。

以上框架,给出了一个较为完善的工作体系, 即学习dense vector(entities, relations), knowledge based completion(即对于训练集中没有出现的关系,能够正确合理的推断/补全), 对话(即我们的理想的应用)。

围绕着以上三点, 2014年以后有很多细致的工作在展开。 nips 2015的 workshop reasoning, attention and memory(RAM)给出了一些突破的方向. 如workshop的题目, 三元组(reason, attention, memory)是解决问题当前的核心, reason 依赖于memory,而attention 给agent 一个读取memory的合理机制。 在knowledge graph进行推理 同样依赖于这个三元组。

Reaserchers/Research Groups:

Facebook:

Jason Weston and Antoine Bordes, Sumit Chopra

Jason等 在14年在google时的一套KB体系基于embedding(entities,relations are all dense vectors) 一个理论框架下的模型(TransfE)与应用(QA, RE).

Translating Embeddings for Modeling Multi-relational Data (theory) Open Question Answering with Weakly Supervised Embedding Models (application Question Answering[QA]) reasoning simple relations within Freebase.

Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction:(application relation extraction[RE])

employs not only weakly labeled text mention data, as most approaches do, but also leverages triples from the known KB. The model thus learns the plausibility of new triples by generalizing from the KB, even though this triple is not present.

weakly supervised means for each pair of entities detected in the text, all relation mentions associated with them are labeled with all the relationships connecting and in the KB, whether they are actually expressed by or not.

CMU

Read the Web

To build a never-ending machine learning system that acquires the ability to extract structured information from unstructured web pages. If successful, this will result in a knowledge base (i.e., a relational database) of structured information that mirrors the content of the Web. We call this system NELL (Never-Ending Language Learner).

Eric P. Xing group(http://www.cs.cmu.edu/~epxing/publications-2016.html)

  1. Harnessing Deep Neural Networks with Logic Rules (distillation method that transfers the structured information of logic rules into the weights of neural networks)
  2. Deep Neural Networks with Massive Learned Knowledge (Distillation model)

Stanford

NLP group

  1. Reasoning With Neural Tensor Networks for Knowledge Base Completion
  2. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors

HAZY (research) -> deepdive(opensource) -> https://lattice.io/ (commercial)

The Hazy project is exploring integrating statistical processing techniques with data processing systems with the goal of making such systems easier to build, to deploy, and to maintain. Currently mainly for knowledge-base construction

we are building several applications, including systems to read large amounts of text and answer sophisticated questions (see WiscI and GeoDeepDive) and building general primitives for data analytics that are now incorporated in products from Oracle and Pivotal. Additionally, some of our ideas have helped to find Neutrinos with IceCube (see IceCube).

DeepDive, a general-purpose statistical inference system, has been released.

GOOGLE:

Google’s Knowledge Vault

constructing knowledge bases based on the web.

Microsoft:

Tasks

conversation/dialog/chat: Knowledge Graphs and Linked Big Data Resources for Conversational Understanding

recommendation: Collaborative Knowledge Base Embedding for Recommender Systems

semantics: Deep learning of knowledge graph embeddings for semantic parsing of Twitter dialogs

Datasets

  1. Freebase triplet relationships
  2. WordNet
  3. babi towards the goal of automatic text understanding and reasoning

MISC

  1. nips-2015-reasoning-workshop important

  2. Memory Networks for Language Understanding, ICML Tutorial 2016

  3. building-the-linkedin-knowledge-graph

  4. Linked Data by connecting various information from different topic domains such as people, books, musics, movies and geographical locations in a unified global data space

Nov 20, 2016 - QUASI-RECURRENT NEURAL NETWORKS

Comments

#QUASI-RECURRENT NEURAL NETWORKS

This paper proposed QRNN mainly to address the expensive computation for deep stacked LSTM. There are some interesting results on decent improvement of char level MT, small drawback on language modeling but all in 10x faster compared to deep stacked variant of LSTM.

features claimed:

  1. stacked QRNNs have better predictive accuracy than stacked LSTMs of the same hidden size
  2. neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. 16 times faster at train and test time.

experiments on QRNN:

  1. language modeling lm
  2. sentiment classification
  3. character-level neural machine translation ml ml

Model

comparision

Stacked LSTM requires expensive computation, but is it neccessary that we have to stack same architechure of LSTM for RNN to work? This model tries to seperate the LSTM by when we can process signal in full paralism and when we need to have this markov propety.

In a gist of the core contribution of this paper, if ⊙ denotes elementwise multiplication.

f-pool:

The simplest option, which Balduzzi & Ghifary (2016) term “dynamic average pooling”, uses only a forget gate: ht =ft ⊙ht−1 +(1−ft)⊙zt

fo-pool:

The function may also include an output gate: ```ct =ft ⊙ct−1 +(1−ft)⊙zt, ht = ot ⊙ ct


### ifo-pool:

recurrence relation may include an independent input and forget gate:
```ct =ft ⊙ct−1 +it ⊙zt,
ht = ot ⊙ ct.